Slides: Distributions of Statistics from Data
DISTRIBUTIONS OF STATISTICS FROM DATA
ST101 – DR. ARIC LABARR
REVIEW
Population
Sample
Statistic
Parameter
Population – set of all objects/individuals of interest.
Sample – subset of the population that information is actually obtained.
Statistic – measures computed from a sample.
Parameter – measures computed from a population.
2
PARAMETERS VS. STATISTICS
Population
Sample
Statistic
Parameter
Population – set of all objects/individuals of interest.
Sample – subset of the population that information is actually obtained.
Statistic – measures computed from a sample.
Parameter – measures computed from a population.
3
POINT ESTIMATORS
Point Estimator (Statistic)
Population Parameter
Sample statistics are point estimates (single number estimates) of a population parameter.
Different population parameters have different corresponding sample statistics.
Samples are estimates of the population.
Statistics are estimates of the parameters.
With any estimation, comes a chance of making errors.
SAMPLES ARE ESTIMATES
SAMPLES VS. POPULATIONS
Population: 1, 3, 5, 5, 7, 9, 4, 6, 10, 2
SAMPLES VS. POPULATIONS
Population: 1, 3, 5, 5, 7, 9, 4, 6, 10, 2
Sample 1: 1, 10, 6, 9
SAMPLES VS. POPULATIONS
Population: 1, 3, 5, 5, 7, 9, 4, 6, 10, 2
Sample 1: 1, 10, 6, 9
Sample 2: 1, 3, 2, 5
SAMPLING ERROR
Population: 1, 3, 5, 5, 7, 9, 4, 6, 10, 2
Sample 1: 1, 10, 6, 9
Sample 2: 1, 3, 2, 5
Both estimates are wrong!
Samples are estimates of the population.
Statistics are estimates of the parameters.
With any estimation, comes a chance of making errors.
Sampling error occurs when there is a difference between a sample point estimate and the corresponding population parameter.
SAMPLING ERROR
SAMPLING ERROR
Population: 1, 3, 5, 5, 7, 9, 4, 6, 10, 2
Sample 1: 1, 10, 6, 9
Sample 2: 1, 3, 2, 5
SAMPLING ERROR
Population: 1, 3, 5, 5, 7, 9, 4, 6, 10, 2
Sample 1: 1, 10, 6, 9
Sample 2: 1, 3, 2, 5
Sampling Error!
SAMPLING ERROR
Population: 1, 3, 5, 5, 7, 9, 4, 6, 10, 2
Sample 1: 1, 10, 6, 9
Sample 2: 1, 3, 2, 5
Typically all we know! Rarely have the parameter to measure sampling error.
SAMPLING ERROR
Population: 1, 3, 5, 5, 7, 9, 4, 6, 10, 2
Sample 1: 1, 10, 6, 9
Sample 2: 1, 3, 2, 5
If sample statistics (like the sample mean) had a predictable pattern,
then the errors would have a typical pattern as well!
Sample statistics are point estimates (single number estimates) of a population parameter.
Sampling error occurs when there is a difference between a sample point estimate and the corresponding population parameter.
If sample statistics (like the sample mean) had a predictable pattern, then the errors would have a typical pattern as well.
SUMMARY
DISTRIBUTIONS OF STATISTICS FROM DATA
POINT ESTIMATORS
Point Estimator (Statistic)
Population Parameter
Sample statistics are point estimates (single number estimates) of a population parameter.
Different population parameters have different corresponding sample statistics.
SAMPLING DISTRIBUTION
SAMPLING DISTRIBUTION
SAMPLING DISTRIBUTION
MANY, MANY SAMPLES
Population: Normal
Mean: 0
S.D.: 1
MANY, MANY SAMPLES
Population: Normal
Mean: 0
S.D.: 1
Sample 1: -1.4, 0.2, -1.7, 2.1, -2.0, 0.5, 1.6, -1.2, 0.6, 0.2
MANY, MANY SAMPLES
Population: Normal
Mean: 0
S.D.: 1
Sample 1: -1.4, 0.2, -1.7, 2.1, -2.0, 0.5, 1.6, -1.2, 0.6, 0.2
Sample 2: 0.8, -0.3, -0.6, -1.1, -1.3, 0.4, -0.9, -0.4, -1.0, -1.2
MANY, MANY SAMPLES
Population: Normal
Mean: 0
S.D.: 1
Sample 1: -1.4, 0.2, -1.7, 2.1, -2.0, 0.5, 1.6, -1.2, 0.6, 0.2
Sample 2: 0.8, -0.3, -0.6, -1.1, -1.3, 0.4, -0.9, -0.4, -1.0, -1.2
Sample 3: -0.2, 2.2, 0.7, 0.5, 1.2, -0.1, -0.6, -0.6, 0.7, -0.6
MANY, MANY SAMPLES
Population: Normal
Mean: 0
S.D.: 1
Sample 1: -1.4, 0.2, -1.7, 2.1, -2.0, 0.5, 1.6, -1.2, 0.6, 0.2
Sample 2: 0.8, -0.3, -0.6, -1.1, -1.3, 0.4, -0.9, -0.4, -1.0, -1.2
Sample 3: -0.2, 2.2, 0.7, 0.5, 1.2, -0.1, -0.6, -0.6, 0.7, -0.6
Sample 4: 2.0, -1.2, 1.6, 0.6, -0.8, 1.2, 0.8, 0.9, 0.5, -1.2
MANY, MANY SAMPLES
Population: Normal
Mean: 0
S.D.: 1
Sample 1: -1.4, 0.2, -1.7, 2.1, -2.0, 0.5, 1.6, -1.2, 0.6, 0.2
Sample 2: 0.8, -0.3, -0.6, -1.1, -1.3, 0.4, -0.9, -0.4, -1.0, -1.2
Sample 3: -0.2, 2.2, 0.7, 0.5, 1.2, -0.1, -0.6, -0.6, 0.7, -0.6
Sample 4: 2.0, -1.2, 1.6, 0.6, -0.8, 1.2, 0.8, 0.9, 0.5, -1.2
…
…
DISTRIBUTION OF SAMPLE MEANS
Population: Normal
Mean: 0
S.D.: 1
Sample 1: -1.4, 0.2, -1.7, 2.1, -2.0, 0.5, 1.6, -1.2, 0.6, 0.2
Sample 2: 0.8, -0.3, -0.6, -1.1, -1.3, 0.4, -0.9, -0.4, -1.0, -1.2
Sample 3: -0.2, 2.2, 0.7, 0.5, 1.2, -0.1, -0.6, -0.6, 0.7, -0.6
Sample 4: 2.0, -1.2, 1.6, 0.6, -0.8, 1.2, 0.8, 0.9, 0.5, -1.2
…
…
What is the distribution
of the sample means?
DISTRIBUTION OF SAMPLE MEANS
Population: Normal
Mean: 0
S.D.: 1
Sample 1: -1.4, 0.2, -1.7, 2.1, -2.0, 0.5, 1.6, -1.2, 0.6, 0.2
Sample 2: 0.8, -0.3, -0.6, -1.1, -1.3, 0.4, -0.9, -0.4, -1.0, -1.2
Sample 3: -0.2, 2.2, 0.7, 0.5, 1.2, -0.1, -0.6, -0.6, 0.7, -0.6
Sample 4: 2.0, -1.2, 1.6, 0.6, -0.8, 1.2, 0.8, 0.9, 0.5, -1.2
…
…
What is the distribution
of the sample means?
MANY, MANY SAMPLES
Population: Uniform
Mean: 0
S.D.: 1
MANY, MANY SAMPLES
Population: Uniform
Mean: 0
S.D.: 1
Sample 1: 0.7, -0.8, -0.2, 0.1, -0.6, 1.5, 1.6, -0.7, 0.7, 0.4
MANY, MANY SAMPLES
Population: Uniform
Mean: 0
S.D.: 1
Sample 1: 0.7, -0.8, -0.2, 0.1, -0.6, 1.5, 1.6, -0.7, 0.7, 0.4
Sample 2: -1.0, -0.5, 0.1, -1.2, 0.1, 1.7, 1.5, 1.1, -1.7, -0.8
MANY, MANY SAMPLES
Population: Uniform
Mean: 0
S.D.: 1
Sample 1: 0.7, -0.8, -0.2, 0.1, -0.6, 1.5, 1.6, -0.7, 0.7, 0.4
Sample 2: -1.0, -0.5, 0.1, -1.2, 0.1, 1.7, 1.5, 1.1, -1.7, -0.8
Sample 3: -0.9, -1.7, 0.2, 0.1, 1.3, -1.4, -1.2, 0.3, -0.1, 1.5
MANY, MANY SAMPLES
Population: Uniform
Mean: 0
S.D.: 1
Sample 1: 0.7, -0.8, -0.2, 0.1, -0.6, 1.5, 1.6, -0.7, 0.7, 0.4
Sample 2: -1.0, -0.5, 0.1, -1.2, 0.1, 1.7, 1.5, 1.1, -1.7, -0.8
Sample 3: -0.9, -1.7, 0.2, 0.1, 1.3, -1.4, -1.2, 0.3, -0.1, 1.5
Sample 4: -0.6, -0.2, 0.8, 0.8, -0.7, -0.6, 1.6, -0.6, 0.6, -0.1
…
…
DISTRIBUTION OF SAMPLE MEANS
Population: Uniform
Mean: 0
S.D.: 1
Sample 1: 0.7, -0.8, -0.2, 0.1, -0.6, 1.5, 1.6, -0.7, 0.7, 0.4
Sample 2: -1.0, -0.5, 0.1, -1.2, 0.1, 1.7, 1.5, 1.1, -1.7, -0.8
Sample 3: -0.9, -1.7, 0.2, 0.1, 1.3, -1.4, -1.2, 0.3, -0.1, 1.5
Sample 4: -0.6, -0.2, 0.8, 0.8, -0.7, -0.6, 1.6, -0.6, 0.6, -0.1
…
…
What is the distribution
of the sample means?
DISTRIBUTION OF SAMPLE MEANS
Population: Uniform
Mean: 0
S.D.: 1
Sample 1: 0.7, -0.8, -0.2, 0.1, -0.6, 1.5, 1.6, -0.7, 0.7, 0.4
Sample 2: -1.0, -0.5, 0.1, -1.2, 0.1, 1.7, 1.5, 1.1, -1.7, -0.8
Sample 3: -0.9, -1.7, 0.2, 0.1, 1.3, -1.4, -1.2, 0.3, -0.1, 1.5
Sample 4: -0.6, -0.2, 0.8, 0.8, -0.7, -0.6, 1.6, -0.6, 0.6, -0.1
…
…
What is the distribution
of the sample means?
CENTRAL LIMIT THEOREM
CENTRAL LIMIT THEOREM
The average daily number of total users is 4,504 with a standard deviation of 1,937. What is the probability that a sample of 50 days has an average between 4,000 and 5,000 total users?
SAMPLING DISTRIBUTION BIKE DATA EXAMPLE
Based on our previous example, all of the possible sample means (from samples of size 50) would have the following distribution:
Based on our previous example, all of the possible sample means (from samples of size 50) would have the following distribution:
The average daily number of total users is 4,504 with a standard deviation of 1,937. What is the probability that a sample of 50 days has an average between 4,000 and 5,000 total users?
SAMPLING DISTRIBUTION BIKE DATA EXAMPLE
The average daily number of total users is 4,504 with a standard deviation of 1,937. What is the probability that a sample of 50 days has an average between 4,000 and 5,000 total users?
SAMPLING DISTRIBUTION BIKE DATA EXAMPLE
The average daily number of total users is 4,504 with a standard deviation of 1,937. What is the probability that a sample of 50 days has an average between 4,000 and 5,000 total users?
SAMPLING DISTRIBUTION BIKE DATA EXAMPLE
The average daily number of total users is 4,504 with a standard deviation of 1,937. What is the probability that a sample of 50 days has an average between 4,000 and 5,000 total users?
SAMPLING DISTRIBUTION BIKE DATA EXAMPLE
The average daily number of total users is 4,504 with a standard deviation of 1,937. What is the probability that a sample of 50 days has an average between 4,000 and 5,000 total users?
SAMPLING DISTRIBUTION BIKE DATA EXAMPLE
The average daily number of total users is 4,504 with a standard deviation of 1,937. What is the probability that a sample of 50 days has an average between 4,000 and 5,000 total users?
SAMPLING DISTRIBUTION BIKE DATA EXAMPLE
The average daily number of total users is 4,504 with a standard deviation of 1,937. What is the probability that a sample of 50 days has an average between 4,000 and 5,000 total users?
SAMPLING DISTRIBUTION BIKE DATA EXAMPLE
The average daily number of total users is 4,504 with a standard deviation of 1,937. What is the probability that a sample of 50 days has an average between 4,000 and 5,000 total users?
SAMPLING DISTRIBUTION BIKE DATA EXAMPLE
The average daily number of total users is 4,504 with a standard deviation of 1,937. What is the probability that a sample of 50 days has an average between 4,000 and 5,000 total users?
SAMPLING DISTRIBUTION BIKE DATA EXAMPLE
SAMPLE SIZE AND SAMPLING DISTRIBUTION
SAMPLE SIZE AND SAMPLING DISTRIBUTION
SUMMARY
DISTRIBUTIONS OF STATISTICS FROM DATA
PROPORTIONS
PROPORTIONS
Sample proportions are similar to sample means.
Customer ID
Gender
Gender Numeric
001
M
0
002
F
1
003
F
1
004
M
0
005
M
0
Sample proportions are similar to sample means.
Customer ID
Gender
Gender Numeric
001
M
0
002
F
1
003
F
1
004
M
0
005
M
0
At least 5 in each of the two categories!
How large is large enough?
For values of p near 0.5, sample sizes as small as 10 can afford a Normal approximation.
With very small (approaching 0) or large (approaching 1) values of p, much larger samples are needed.
SAMPLING DISTRIBUTION
You think that people are more likely to rent a bike on a clear or cloudy day compared to misty / rain / snow. In your data, 63% of the days are clear or cloudy. What is the probability that you sample 50 days and less then half of them are clear or cloudy?
SAMPLING DISTRIBUTION BIKE DATA EXAMPLE
You think that people are more likely to rent a bike on a clear or cloudy day compared to misty / rain / snow. In your data, 63% of the days are clear or cloudy. What is the probability that you sample 50 days and less then half of them are clear or cloudy?
SAMPLING DISTRIBUTION BIKE DATA EXAMPLE
You think that people are more likely to rent a bike on a clear or cloudy day compared to misty / rain / snow. In your data, 63% of the days are clear or cloudy. What is the probability that you sample 50 days and less then half of them are clear or cloudy?
You think that people are more likely to rent a bike on a clear or cloudy day compared to misty / rain / snow. In your data, 63% of the days are clear or cloudy. What is the probability that you sample 50 days and less then half of them are clear or cloudy?
You think that people are more likely to rent a bike on a clear or cloudy day compared to misty / rain / snow. In your data, 63% of the days are clear or cloudy. What is the probability that you sample 50 days and less then half of them are clear or cloudy?
You think that people are more likely to rent a bike on a clear or cloudy day compared to misty / rain / snow. In your data, 63% of the days are clear or cloudy. What is the probability that you sample 50 days and less then half of them are clear or cloudy?
SUMMARY