Slides: Interval Estimation with Data
INTERVAL ESTIMATION WITH DATA
ST101 – DR. ARIC LABARR
A point estimator cannot be expected to provide the exact value of the population parameter.
An interval estimate can be computed by adding and subtracting a margin or error to the point estimate:
The purpose of an interval estimate is to provide information about how close the point estimate is to the value of the parameter.
MARGIN OF ERROR
2
The purpose of an interval estimate is to provide information about how close the point estimate is to the value of the parameter.
This does not mean that your interval estimates will always contain the population parameter.
MARGIN OF ERROR
3
Confidence Intervals are interval estimates where we say we have a certain level of confidence in the interval.
For example, we are 95% confident that the population average daily number of total users of the bike rental company is between 4,000 and 5,000.
CONFIDENCE INTERVALS
Confidence Intervals are interval estimates where we say we have a certain level of confidence in the interval.
For example, we are 95% confident that the population average daily number of total users of the bike rental company is between 4,000 and 5,000.
CONFIDENCE INTERVALS
If we were to take many samples (same size) that
each produced different confidence intervals, then
95% of them would contain the true parameter.
Confidence Intervals are interval estimates where we say we have a certain level of confidence in the interval.
For example, we are 95% confident that the population average daily number of total users of the bike rental company is between 4,000 and 5,000.
CONFIDENCE INTERVALS
95% of the time, our confidence intervals would
contain the true parameter of interest.
CONFIDENCE INTERVALS EXAMPLE
CONFIDENCE INTERVALS EXAMPLE
CONFIDENCE INTERVALS EXAMPLE
CONFIDENCE INTERVALS EXAMPLE
CONFIDENCE INTERVALS EXAMPLE
CONFIDENCE INTERVALS EXAMPLE
Confidence Intervals are interval estimates where we say we have a certain level of confidence in the interval.
For example, we are 95% confident that the population average daily number of total users of the bike rental company is between 4,000 and 5,000.
CONFIDENCE INTERVALS
NOT 95% chance the population parameter
falls inside our one confidence interval.
Confidence Intervals are interval estimates where we say we have a certain level of confidence in the interval.
Confidence implies if we were to take many samples (same size) that each produced different confidence intervals, then 95% of them would contain the true parameter.
Confidence is NOT the chance the population parameter falls inside our one confidence interval.
SUMMARY
INTERVAL ESTIMATION WITH DATA
An interval estimate can be computed by adding and subtracting a margin or error to the point estimate:
The purpose of an interval estimate is to provide information about how close the point estimate is to the value of the parameter.
MARGIN OF ERROR
16
EMPIRICAL RULE
EMPIRICAL RULE
z
.00
.01
.02
.03
.04
.05
.06
.07
.08
.09
.
.
.
.
.
.
.
.
.
.
.
-1.9
.0287
.0281
.0274
.0268
.0262
.0256
.0250
.0244
.0239
.0233
-1.8
.0359
.0351
.0344
.0336
.0329
.0322
.0314
.0307
.0301
.0294
-1.7
.0446
.0436
.0427
.0418
.0409
.0401
.0392
.0384
.0375
.0367
-1.6
.0548
.0537
.0526
.0516
.0505
.0495
.0485
.0475
.0465
.0455
-1.5
.0668
.0655
.0643
.0630
.0618
.0606
.0594
.0582
.0571
.0559
.
.
.
.
.
.
.
.
.
.
.
CONFIDENCE INTERVAL BIKE DATA EXAMPLE
You think that people are more likely to rent a bike on a clear or cloudy day compared to misty / rain / snow. Your data is a sample of 731 days with 63% clear or cloudy. Build a 90% confidence interval for the true proportion of clear or cloudy days where your company operates.
CONFIDENCE INTERVAL BIKE DATA EXAMPLE
You think that people are more likely to rent a bike on a clear or cloudy day compared to misty / rain / snow. Your data is a sample of 731 days with 63% clear or cloudy. Build a 90% confidence interval for the true proportion of clear or cloudy days where your company operates.
CONFIDENCE INTERVAL BIKE DATA EXAMPLE
z
.00
.01
.02
.03
.04
.05
.06
.07
.08
.09
.
.
.
.
.
.
.
.
.
.
.
-1.9
.0287
.0281
.0274
.0268
.0262
.0256
.0250
.0244
.0239
.0233
-1.8
.0359
.0351
.0344
.0336
.0329
.0322
.0314
.0307
.0301
.0294
-1.7
.0446
.0436
.0427
.0418
.0409
.0401
.0392
.0384
.0375
.0367
-1.6
.0548
.0537
.0526
.0516
.0505
.0495
.0485
.0475
.0465
.0455
-1.5
.0668
.0655
.0643
.0630
.0618
.0606
.0594
.0582
.0571
.0559
.
.
.
.
.
.
.
.
.
.
.
CONFIDENCE INTERVAL BIKE DATA EXAMPLE
You think that people are more likely to rent a bike on a clear or cloudy day compared to misty / rain / snow. Your data is a sample of 731 days with 63% clear or cloudy. Build a 90% confidence interval for the true proportion of clear or cloudy days where your company operates.
CONFIDENCE INTERVAL BIKE DATA EXAMPLE
You think that people are more likely to rent a bike on a clear or cloudy day compared to misty / rain / snow. Your data is a sample of 731 days with 63% clear or cloudy. Build a 90% confidence interval for the true proportion of clear or cloudy days where your company operates.
CONFIDENCE INTERVAL BIKE DATA EXAMPLE
You think that people are more likely to rent a bike on a clear or cloudy day compared to misty / rain / snow. Your data is a sample of 731 days with 63% clear or cloudy. Build a 90% confidence interval for the true proportion of clear or cloudy days where your company operates.
OR
SUMMARY
INTERVAL ESTIMATION WITH DATA
An interval estimate can be computed by adding and subtracting a margin or error to the point estimate:
The purpose of an interval estimate is to provide information about how close the point estimate is to the value of the parameter.
MARGIN OF ERROR
42
PROBLEM!
Need to estimate
still with s.
For larger samples, the t distribution is approximately the standard Normal distribution.
t distribution with d.f. = 10
t distribution with d.f. = 20
Standard Normal distribution
d.f.
50%
60%
70%
80%
90%
95%
98%
99%
99.8%
99.9%
.
.
.
.
.
.
.
.
.
.
.
26
0.684
0.856
1.058
1.315
1.706
2.056
2.479
2.779
3.435
3.707
27
0.684
0.855
1.057
1.314
1.703
2.052
2.473
2.771
3.421
3.690
28
0.683
0.855
1.056
1.313
1.701
2.048
2.467
2.763
3.408
3.674
29
0.683
0.854
1.055
1.311
1.699
2.045
2.462
2.756
3.396
3.659
30
0.683
0.854
1.055
1.310
1.697
2.042
2.457
2.750
3.385
3.646
.
.
.
.
.
.
.
.
.
.
.
ADDITIONAL ASSUMPTIONS
The average daily number of total users is 4,504 with a standard deviation of 1,937 in our sample of 731 days. Build a 95% confidence interval for the average daily number of total users.
CONFIDENCE INTERVAL BIKE DATA EXAMPLE
The average daily number of total users is 4,504 with a standard deviation of 1,937 in our sample of 731 days. Build a 95% confidence interval for the average daily number of total users.
CONFIDENCE INTERVAL BIKE DATA EXAMPLE
OR
SUMMARY
SAMPLE SIZE CALCULATION
INTERVAL ESTIMATION WITH DATA
What if we wanted to know what sample size I would need to collect to get a desired margin of error?
Instead of calculating a confidence interval (or margin of error) after a sample is taken, we can look at the problem in reverse.
For example, your boss allows a margin of error of E, but wants you to take as small of a sample as needed to have at least that margin of error.
REVERSING THE PROBLEM
Take the margin of error:
Solve for sample size:
Solve for sample size:
Solve for sample size:
CONFIDENCE INTERVAL BIKE DATA EXAMPLE
You think that people are more likely to rent a bike on a clear or cloudy day compared to misty / rain / snow. Your data is a sample of 731 days with 63% clear or cloudy. Build a 90% confidence interval for the true proportion of clear or cloudy days where your company operates.
OR
What if that margin of error is too bigfor what the company wants?
CONFIDENCE INTERVAL BIKE DATA EXAMPLE
You think that people are more likely to rent a bike on a clear or cloudy day compared to misty / rain / snow. You want to know the proportion of clear or cloudy days within 2% error. What sample size would we need for that?
CONFIDENCE INTERVAL BIKE DATA EXAMPLE
You think that people are more likely to rent a bike on a clear or cloudy day compared to misty / rain / snow. You want to know the proportion of clear or cloudy days within 2% error for a 90% confidence interval. What sample size would we need for that?
Take the margin of error:
Solve for sample size:
Solve for sample size:
Solve for sample size:
Solve for sample size:
Don’t know ahead of sampling because
it depends on sample size!
Solve for sample size:
Typically use Normal distribution approximation
SUMMARY