Chapter 1: Introduction, beginning terminology, and summation notation
Statistics: the methods of collecting, reading, and using data in order to make decisions
Descriptive Statistics: Raw statistical data (ex: 50% of men and 25% of women suffer from too much stress)
Inferential Statistics: Statements or predictions made by using statistical data (ex:men are more likely to suffer from stress than women)
Population: a group of people or objects that are being studied
Sample: a portion of a population from which information is gathered
Populations (i.e. Chicago Cubs fans) are often too large to collect information from; in most cases, information is gathered from a sample of that populations.
Census: Information gathered from an entire population; rarely done due to cost and time requirements
Survey: Information gathered from a sample of the population
Representative sample: a sample that shares characteristics with its populations as closely as possible
Random sample: a sample whose objects are collected from the population randomly, with each object having an equal chance of being chosen
Element/member: a specific object in a sample being analyzed (a car in the sample of several cars)
Variable: a characteristic that may differ between different elements (the color of paint on the car)
Constant: a characteristic that is the same for all elements in a sample (all cars in the sample are painted)
Observation/measurement: the value of a variable of an element (one car in the sample is painted blue)
Data set: a collection of observations of one or more variables (from a sample of 10 cars, 2 were grey, 3 were red, and 5 were blue)
Types of Variables
Quantitative variable: a variable that is identified by a number (12 eggs, 23 years old)
- Discrete: a quantitative variable that is counted (number of eggs); always in whole numbers
- Continuous: a quantitative variable that is measured (age, length, weight); able to be measured in decimals (4 1/2 years old; 70.23 inches)
Qualitative/categorical variable: a variable that is identified by a certain quality (color, favorite food)
Types of Data
Cross-section data: Data gathered from different sources at the same time (customer satisfaction at Wal-mart, Home Depot, and Gamestop for 2009)
Time-series data: Data gathered from one source at different points in time (customer satisfation at Wal-mart in 2007, 2008, 2009, and 2010)
Sources of Data
Internal: Information from within one's own organization (a company forecasts future sales by looking at its previous sales records)
External: Information collected by someone else (I access ancestry.com to look up genealogical information)
Surverys: Information solicited from a sample population (5,000 voters are surveryed to predict the outcome of the presidential election)
Experiment: Information is gathered through experimentation (a mouse is found to correctly choose the cup with cheese in it with 75% accuracy)
The letter x is used to refer to a variable in a specified data set. When x is followed by a number in subscript (i.e. x1) it refers to a specific number in the set.
Example: Five people are chosen from a college class. Their ages are 24, 19, 24, 26, and 18.
If x = student age, x1 = 24, x2 = 19, x3 = 24 x4 = 26, x5 = 18.
The summation of these variables--x1 + x2 + x3 + x4 + x5--is denoted as Σx(the greek letter is called Sigma, standing for the word summation). It is read as "sigma x", "summation of x", or "the sum of all values of the variable x"
If a formula contains "Σx2", it is read as "the summation of the squares of the values of the variable x". So if the values for x were 2, 3, and 4, then the squares of x would be 4, 9, and 16 and Σx2 would equal 29.
If a formula contains "(Σx)2", then the summation of x is found as it is normally, and then the final sum is the amount that is squared. So if the values for x were 2, 3, and 4, then the summation of x would be 9, and the square of the summation would be 81.