Video Transcript: Interval Estimation with Data - Part 3
Welcome. Let's finish up this section of the course by talking about the interval estimation now of the sample statistic x bar, our sample mean. As another reminder, an interval estimate is basically computed by taking that point estimate that you calculate from your sample and adding in that margin of error, just like we talked about a couple lectures ago. Now, the purpose of an interval estimate again is to provide some idea of wiggle room, some notion of how close we are with our guess. Now, you'll also remember that the sampling distribution of x bar is going to be a key factor in this, just like the sampling distribution of p hat was important to us calculating our confidence interval for p hat in the last lecture. The sampling distribution of x bar is going to play a key role in helping us calculate the confidence interval for x bar in this lecture. Now, what is the sampling distribution of x bar? If you remember, according to the central limit theorem, the sampling distribution of x bar is approximately normal as long as your sample size is big enough, for a sample size being big enough here is where we had at least 50 observations inside of our sample, that means that the sampling distribution of x bar looks like the following: you basically have this normal distribution centered around the true population mean, mu, with a standard deviation of x bar being the population standard deviation sigma divided by the square root of n, our sample size, so as again a quick refresher, if we were to take many, many, many, many, many, many samples all of the same size, calculated the sample average x bar from each one of those samples, and plotted them all on a distribution, you would see this, the normal distribution, as long as those samples were big enough, at least 50 observations. However, there is a problem that we have here that we did not have when it came to our proportions, that problem exists right here. We don't know the population standard deviation. Now I know you're thinking, wait a minute, we kind of had a problem like this with proportions. With proportions, we didn't know what the population proportion p was so we just estimated it with p hat. Well, that's great, but we already had an estimate for p, it was p hat, because p hat was what we were already estimating. The problem with means is we don't have an estimate for sigma already. We have an estimate for mu. We have an estimate for the population mean. Our sample mean is an estimate for the population mean, just like our sample proportion is an estimate for the population proportion, but here we have something different. We don't already have an estimate for sigma, so we're going to have to calculate another estimate. So we have two estimates going on here. We're estimating the population mean, mu, with x bar, and now we're going to have to estimate the population standard deviation sigma, and we're going to estimate that with the sample standard deviation s. I know you're thinking, okay, well, so now we have to make two estimates. You're right, now we have to estimate both mu and sigma. Now I know what you think, that that doesn't sound like it'd be too big of a problem, but it actually is quite a big problem. Since we do not know the population standard deviation and need to
estimate it, we actually have to add extra error into our two into our calculations. This makes sense. Estimating two things is going to have more error than just estimating one, right? If I asked you to be able to tell me whether or not I flip a coin and get heads or tails, or I gave you the option of trying to guess two coin flips being heads or tails, you would always say I'll just take guessing one. Why there's more chance of you being wrong having to guess twice than there is of you having to guess once, and so because now we have to guess twice, we have to guess the population mean with x bar. We have to guess the population standard deviation with s. We need to add in some more built-in wiggle room. Again, we didn't have this problem with proportions because we're still only estimating one number. We're still only estimating the population proportion p. I'm just using that estimate in multiple places, but I still only need to estimate one thing here. I need to estimate two, and so because I need to estimate two, the normal distribution is no longer a good approximation for the sampling distribution of x bar. You see, the central limit theorem works really well if all you care about is x bar, but the second you start going, I need to estimate something else as well. We're going to need to use another distribution. Luckily, we have another distribution that we can use. It's called the student t distribution. Now, the t distribution is a family of distributions, much like the normal distribution is a family of distributions. The normal distribution can take many shapes, it can have many means and many different spreads. Well, the t distribution is also a family. Now we won't get into all the details behind why it's called the student t distribution, that's a fun little Google experiment, if you really cared, but we'll just call it the t distribution now. The t distribution is symmetric, just like the normal distribution, however, it has thicker tails, in other words, it has a little bit more wiggle room in the tails, it has a wider margin of error, if you will. Now, the t distribution is defined by a single number, we call it degrees of freedom. This degrees of freedom tells us essentially how wide the t distribution is. Think about this degrees of freedom being similar to a normal distribution's standard deviation. It’s a little bit more formally, degrees of freedom are the number of independent pieces of information that go into the computation of s, our sample standard deviation. More degrees of freedom leads to less dispersion, so the more degrees of freedom we have, the more information we have to calculate s, which is actually going to make our distribution more and more narrow. Again, I know it may be a little confusing. Let's think about this intuitively, your degrees of freedom is calculated as the sample size, little n minus one. Basically, the bigger your sample size, the more confidence, probably not the best word, the more belief you have in your number, so the bigger your sample size, the more narrow the t distribution is going to be. The smaller your sample size, the more wiggle room this t distribution is going to have to have to account for the fact that you don't have a lot of data backing this up as samples get larger and larger and larger and larger, the t distribution becomes approximately just the standard
normal distribution. So, when we have really large samples, it's basically just like using a normal distribution, but for small samples it's going to add even more wiggle room, and that's the idea, right. We're trying to say that we need to estimate two things, and because we need to estimate two things, we should add more wiggle room, but of course, if we have extremely large samples, 1000s upon 1000s upon 1000s of people, then you know what that extra wiggle room we need to add is really, really small, but if we don't have a lot of people in our sample, then we need to add a little extra wiggle room, because we're calculating two things as compared to one, that's all I'm saying. Here's to try and visualize this for you, so you have the standard normal distribution, that's the more narrow curve that you see here. Then you see two different t distributions, one with 20 degrees of freedom and one with 10 degrees of freedom. You'll notice the t distribution with 10 degrees of freedom is a little bit wider than the t distribution with 20 degrees of freedom, which is also wider than the standard normal distribution. Again, this is to try and help you visualize what we're doing, because of the fact that we have to estimate two numbers, we need to add in some more wiggle room, as you can see the t distribution being wider, having more data in the tails and less data in the middle is a good way of being able to add in that extra wiggle room, while still keeping that bell-shaped symmetric curve that we like to see. All right, so we can use the same idea that we had when it came to the empirical rule, except we're going to use the t distribution instead of the normal distribution for the confidence interval, so we still have the same idea, though. If we want the middle percentage of our data, what we're going to do is we're going to take the error and split that error into two pieces, one below the interval, one above the interval, and just like we did with proportions, we're going to take our estimate, then we're going to add and subtract some number, except instead of from a normal distribution, it's going to be from a t distribution, and then we're going to multiply that by the standard deviation of x bar. All right, so let's imagine again we want a 95% confidence interval. That means we're going to put 2.5% of our error below the confidence interval and 2.5% of our error above the confidence interval, again with the idea being we're going to be wrong 5% of the time, but we don't know whether we're going to be wrong high or wrong low, so we're going to split that error into two pieces, but again, just like we did for the normal distribution, where we would go, well, what is the middle 95% We need to do the same thing for the t distribution. What is the middle 95% and what essentially is that value of t, when our sample size, for example, is 30. Well, just like we had a normal table, we also have a t table. Yay, more tables to look at. I know, I know, they can be a little bit confusing. The nice part is we can use calculators that already have these built in. A lot of computers already have these built in, and we can also use these tables as well. But the t table is actually much more designed for confidence intervals than something like the normal table that we've been using up until
now. Let me direct your eyes to the t table here. I'm just showing you a few of the rows, but on the left-hand side, the far left-hand column, it is the degrees of freedom. So, again, you just find the degrees of freedom you're interested in. The column names at the top, the very, very top row tells what kind of confidence interval you would like, so for example, we have a sample size of 30, that means we have degrees of freedom of 30 minus one, or 29 so I would look at row 29 and then I would look at column 95% and that tells me that the value of t alpha over two, that piece of the confidence interval, is a little bigger than two 2.045 Again, this isn't too surprising, right? This number is a little bit bigger than the normal distribution's value at 95% The normal distribution at 95% was 1.96 This is 2.045 Again, we're adding in a little extra wiggle room because we had to estimate an extra number, that's the idea. So we would essentially have this estimate minus 2.045 times our standard error, and then our estimate plus 2.045 times our standard error, which is essentially our confidence interval. You have your estimate x bar plus or minus your margin of error, again, more formally, the confidence interval for x bar, with our confidence coefficient of one minus alpha, basically alpha being the error rate, is just going to be your point estimate x bar plus or minus the value not from a normal distribution, the value from the t distribution times your standard error of x bar s over the square root of n. Now remember, why do we call this a standard error? Well, the standard deviation of x bar is sigma over the square root of n. We don't know sigma, we have to estimate that, so because we're estimating a standard deviation, it is now called a standard error. Awesome. Now we do have some assumptions we have to think about here. Remember, this is all still is relying on the central limit theorem, so for large samples. A sample size greater than or equal to 50, I can calculate this confidence interval for the mean from any population. Doesn't matter what population distribution we have, your data can look however it wants in terms of the population's distribution, as long as you have large samples. Central limit theorem holds, you can do the calculation we just talked about. However, for small samples, samples less than 50, we need to assume the population follows a normal distribution to actually pull off that calculation. So, just another additional thing to think about. As long as you take a big sample, you shouldn't have to worry. All right, so let's go ahead and work through an example with our bike data. So, the average daily number of total users is 4504 with a standard deviation of 1937 Now, remember, we have a sample of 731 days, so let's build a 95% confidence interval for the average daily number of total users. All right, so again we need to use the t distribution instead of the normal distribution for the confidence intervals of x bar. So the question now becomes, what is the value of t for an n equal to 731? Okay, so for an n of 731 our degrees of freedom would be 730 and that corresponds to a t value of 1.965 Whoa, hold on a second. The normal distribution was 1.96 Yes, and as the sample size gets bigger and bigger and bigger and bigger, what do
we know? We know that the t distribution looks more and more and more and more like the normal distribution, so this number is going to look closer and closer and closer to that normal distribution's 1.96 So for us it's 1.965 awesome. So now we can just plug it into our equation, so we have our sample mean 4504 plus or minus 1.965 times 1937 our sample standard deviation divided by the square root of our sample size, 731 or in other words, what we can say is that our confidence interval for the average daily number of users is 4504 plus or minus about 141 people, more specifically 140.8 or if you wanted to say it a different way, you could say that our confidence interval for the average daily number of users is between 4363.2 people up to 4644.8 people. Personally, I like the left hand side better, I like 4504 plus or minus 140.8 I don't know why, just resonates with me a little bit better. Everyone's different, though. You might like that idea of point estimate plus or minus margin of error, or you might like actually looking at the interval itself. Either way, it's the same thing, so you'd be fine. All right, let's wrap it all up, so the confidence interval for the sample mean x bar with a confidence coefficient of one minus alpha, that's an error rate of phi of alpha, is the following: it's x bar plus or minus that value from the t distribution t alpha over two times the standard error of x bar s over the square root of n. Wow, we've covered so much when it comes to this section of the course. Hopefully, you can start to see how all these sections are now starting to build on themselves. Like I mentioned, we talked about probabilities, we talked about normal distributions, we talked about sampling distributions, all to lead up to these calculations of confidence intervals, so now whenever we have a sample, it's not just, well, I think that the average number of users is 4504 no, I think the average number of users is 4504 plus or minus 141 I'm giving myself a little wiggle room. It adds credence to whatever you say when you're trying to share your results. People are going to believe you a lot more if you give them some idea of wiggle room as compared to just a single number, right? If I walked up to you and said the average height of Americans, according to my data, according to my sample, is six feet tall. Okay, well, that's that's good. Thank you. But if I said I think the average height of all Americans, according to my sample, is six feet tall, plus or minus three inches, you'd probably believe that second statement, a little bit more, because you're like, oh, okay, that helps. It gives me some idea of context, it gives me a little bit of wiggle room on that estimate. That's the whole idea. When we report estimates, we should typically report wiggle room with them, so that people can believe in them more, they can see what kind of error that we think we're going to have with those estimates. So this is why we do these things in statistics. This is why we laid all that foundation, so we could start giving numbers like this, which are so much more meaningful when we report results from data, as compared to just a single number. But that is the end of this lecture. That is the end of this section, and I look forward to seeing you in the next one.