Notes for Elementary Statistics (STA2023) User: Joel Vannatta--Chapter 2

Raw Data: data that is in the form of which it was originally collected.

Example: the age of 5 students were recorded, the ages being 19, 18, 23, 20, and 17. This is the raw data.

Ungrouped data: data that is not grouped within categories or ranges. The previous example is ungrouped data.

Frequency distribution of Qualitative Data
Frequency Distribution: a list containing qualitative data alongside the frequency of each category.

Example: Frequency distribution table showing the frequency of different colors of cars.

Car color

Relative frequency: a frequency of a specific catergory in a frequency distribution compared to the total of all frequencies

Relative frequency = frequency of a category/sum of all frequencies. The answer will be a decimal number between 0 (never occurs) and 1(always occurs), (i.e. 0.75)

To find the percentage value of a relative frequency, multiply the frequency by 100. (0.75 x 100 = 75%)

The sum of all relative frequencies will equal 1.

Bar graph: a graph composed of bars whose height represent the frequency of their respective categories

Pie chart: a circular chart divided into wedges based upon the pergentages of the different categories

In a pie chart, the frequency multiplied by 360 will result in the number of degrees in the angle of an individual wedge.

Frequency distribution of Quantitative data
Class: In quantitative data, a class is composed of all variables which measure withing the high and low numbers of a class. Data represented in different classes is called grouped data.

Ex. A frequency distribution measuring ages might separate the sample into 0-9 year olds, 10-19 year olds, 20-29 year olds, etc.

All classes in a frequency distribution are of equal size, and have no gaps between them. Each class has an upper and lower class boundary, which is one upper class boundary plus the corresponding lower class boundary, divided by 2.

Example: Class boundary between these two classes is (19 + 20)/2 = 19.5. In this case, the class "10-19 year olds" will include any amounts that are both greater than or equal to 10, and less than 19.5 (or, any amounts between 10 to 19.499999....).

Class width is the measurement of a class's upper boundery minus its lower boundary. In the above example, the class "10-19 year olds" has a width of 19 - 10 = 9.

Class midpoint/mark is found by dividing the sum of a class's boundaries by 2. In the above example, the class "10-19 year olds" has a midpoint of (10+19)/2 = 14.5.

Calculating Class Width
Class width is found by subtracting the highest and lowest values of the data set, and dividing the result by the number of classes (determined by whoever is organizing the data, usually between 5 and 20). The result is then rounded to a convenient number (usually the nearest whole number). The starting point (the lower boundary of the first class) is set to be less than or equal to the lowest value.

Other kinds of graphs
Histogram: similar to a bar graph (which measures qualitative data), a histogram measures qualitative data, representing the frequency distributin of the different values, or of the different classes of values. It appears as a bar graph with no space between the different bars. It can measure frequency (12), relative frequency (.6), or percentage (60%).

Polygon graph: A graph in which dots are used to show the frequency of values or classes, and straight lines are used to connect the dots, forming a polygonal shape.

Frequency distribution curve: a graph in which a curved line shows the frequency of a large number of classes; essentially a polygon graph that has curved lines.

Shapes of Histograms
A histogram comes in three main shapes:

Symmetric: The histogram is mirrored on both sides at the midpoint, resulting in a "bell curve".

Skewed: The height of the histogram increases sharply to the right or left; it is "skewed left" or "skewed right" based on the direction of the "tail", or shorter end of the histogram.

Uniform/rectangular: All points on the histogram are equal, forming a rectangle shape.

Cumulative frequency distribution is the number of values which fall below the upper boundary of any specific class. For example, is 3 people in the 0-9 class, 7 people are in the 10-19 class, and 10 people are in the 20-29 class, 3 people fall below the 9.5 boundary, 10 are below the 19.5 boundary, and 20 are below 29.5.

Cumulative relative frequency is done in the same manner as relative frequency, dividing the cumulative frequency of a class by the number of values in the set.

Stem-and-leaf displays are charts used to quickly record values containing 2 or more digits.

Example: 14, 28, 23, 12, 11, 11, 15, 24, 26, 26

1| 4 1 1 5

2| 8 3 4 6 6

It can then be grouped in order:

1 | 1 1 4 5

2| 3 4 6 6 8

This allows one to see the frequency of different classes of values (10's, 20's, etc) while still being able to observe the individual values (rather than grouping them all into "10-19" and "20-29")

Dot plots and outliers
Dot plots are charts that have dots stacked on top of each other representing frequencies of classes or values. They are useful in identifying outliers.

Outliers are values that are exceptionally high or low compared to the other values, and are not representative of the usual results.



Continue to Chapter 3

Return to Chapter list