Welcome. Let's continue this section of the course around interval estimation  with data by now specifically talking about the interval estimation of our sample  point estimate, the sample proportion p hat. Let's remind ourselves about  something from our previous lecture, an interval estimate can be computed by  adding and subtracting essentially a margin of error, some wiggle room around  your point estimate. So you take your original point estimate, and then again you add and subtract your margin of error. The purpose of an interval estimate is to  provide information about how close the point estimate is to the value of the  parameter. Here we're going to use the sampling distribution of p hat to be able  to play a key role in computing this margin of error. Again, this is why we talked  so much in the previous section of the course around sampling distributions. If  we know how point estimates move, if we know essentially how point estimates  change from sample to sample to sample to sample, then we can get an idea of  what this margin of error should be. now the sampling distribution of p hat, if you remember, according to the central limit theorem, is approximately the normal  distribution when we had a large enough sample size, and for us a large enough sample size was where we had the sample size n times the proportion p being  greater than or equal to five, and the sample size n times one minus the  proportion p being greater than or equal to five. Essentially, we have five  successes and five failures in our sample. All right, so that means that this is the sampling distribution for p hat, so if we were to take many different samples,  take the sample proportion from each one of those samples, and we were to plot the sample proportion in a distribution, you would see the normal distribution.  That normal distribution would have a mean of p, the true population parameter,  and if you remember, it also has a standard deviation, sigma, that was the  square root of p times one minus p over n. Well, how can we use this  information to help us with this idea of margin of error? Let's remember the  empirical rule. If you remember the empirical rule, the empirical rule basically  said for a normal distribution that if you are within one standard deviation of the  mean, so if you were to take the mean, subtract the standard deviation, then  take the mean and add one standard deviation. Everything in that range would  be about 68% of your data. If you were to go two standard deviations below and  two standard deviations above the mean, again we'd cover approximately 95%  of our data, and three standard deviations below to three standard deviations  above the mean would be almost all of your data, specifically 99.7% Well,  instead of using mu and sigma, let's put it in the context of the problem we have. We have p, and then we have sigma of p hat. Again, this would be the  distribution of all the p hats. The p hats should be centered at p, and they will  have a standard deviation of sigma of p hat, which remember was that square  root of p times one minus p over n. So, let's look at an example where we're  looking at two standard deviations below the mean and two standard deviations  above it, so if we were to take p, subtract off two standard deviations, and then 

take p and add two standard deviations, if we remember the empirical rule, this  is about 95% of your data, specifically it's 95.44% of your data. Hold on a  second. What did we learn about confidence intervals? Confidence intervals are  a point estimate plus or minus a margin of error. Well, we could have a point  estimate like p hat, then we could subtract and add a margin of error. Wait a  minute, what are we doing down at the bottom of that normal distribution? We're taking some value p and then we're adding and subtracting some value two  times the Standard deviation. Well, wait a minute. If we were to do that, so  95.44% of our data is in between these two values, that would leave about  2.28% below and 2.28% above. What we're basically doing is we're calculating  something that looks like a confidence interval. We're taking some estimate,  we're adding a number, and then we're taking some estimate, and we're  subtracting a number. And so, by doing that, we're essentially creating  something like a confidence interval. Think about our last lecture, I said I was  95% confident something would happen. Well, take a look at what we have  here. 95% of our data in a normal distribution is about two standard deviations  away from the mean. In fact, we can actually do this for any number. The  empirical rule showed us that, but we also talked about that when we talked  about standardized normal distributions. I can tell you the middle percentage  being any percentage on a normal distribution, 95% is easy because it's about  two standard deviations. The middle 68% would be about one standard  deviation, but remember, because we have something like the standard normal  table, I can look at anything, I could look at the middle 90% I could look at the  middle 80% I can look at the middle 87% whatever we'd like to be able to do,  and that's the idea of what we're doing with this confidence interval, is we're  basically saying I can take some number, my point estimate, I can subtract off  some point times the standard deviation of my point estimate, and then I can  build a confidence interval, it's just a matter of what you want that distribution to  look like. What do you want those percentages to be? If I want the middle to be  95% then that would mean each tail contains two and a half percent. If you  wanted the middle to be something like 90% then each tail would contain 5%  So, as you can see, we can do this with any number, but let's again try and do  this idea of a 95% confidence interval. So, if I want the middle 95% of my data,  what I'm basically telling you is we can create a confidence interval by taking our point estimate, adding and subtracting some number, we'll call it z alpha over  two times the standard deviation of p hat. Well, what is this z alpha over two?  This z alpha over two is basically the same point on a standard normal  distribution that has a shaded area that takes the value of alpha over two, so  again, if I wanted 95% then alpha would be 5% because 95% is the middle, so  one minus 95% would tell me what I have left over, that would be 5% but I'm  going to put 5% split into two pieces, so 2.5% in the bottom, 2.5% in the top, so  what I would need to do is I would need to know what point on a standard 

normal distribution is going to be the point where I have 2.5% of my data in the  tail. We know how to do this. We looked at this when it came to our sampling  distribution, as well as our standard normal score calculations over the last two  sections of the course, like I said, we're just building upon everything we've  learned. So, if we were to look in the table and say I want a probability of 2.5%,  or 0.025 what value on a normal distribution, a standard normal distribution,  gives me a probability of 0.025. Well, let's take a look in the table. If we were to  try and find 0.025 in the middle of the table, not in the edges, not on the far left  hand side or the upper top, we want to find it in the middle. If we found 0.025 in  the middle, we would see that the Z value, the spot on the normal distribution  would be negative 1.96 Wait a minute, hold on. Now that's really close to 2, isn't  it? Oh, well, remember. We said that the empirical rule is approximately two  standard deviations away from the mean, from the middle is approximately 95%  of the data, exactly it was 95.44% so it's close to two standard deviations, oh  1.96 standard deviations, so if we were to take our point estimate, subtract off  1.96 times the standard deviation of our point estimate, that would leave only  2.5% below that value. If we were to take the same thing but flip it on the other  side, take our point estimate and add 1.96 times the standard deviation, we  would have only 2.5% of our data above that value, which means we'd have  95% of our data in the middle of that value, and that's the idea of margin of error, I'm adding in some notion of wiggle room, I have p hat plus or minus this idea of  margin of error, where margin of error basically is some point on a standard  normal distribution times the standard deviation of your estimate, which for us  remember was the square root of p times one minus p over n, so if you wanted  to calculate a confidence interval for p hat, this is the equation. So the  confidence interval for p hat with a confidence coefficient of one minus alpha,  basically an error of alpha, is the following. So p hat plus or minus some number on a standard normal table, we'll call that z alpha over two times the standard  deviation of p hat, which is the square root of p times one minus p over n. Now  again, what do I mean by this confidence coefficient of one minus alpha, so for  example, if you want a 95% confidence interval, that means you're going to be  wrong 5% of the time. Then you would look up that spot on the normal  distribution that's alpha over two 2.5%. Why? Because again, remember we're  splitting that 5% error, half of it goes below our interval, half of it goes above our  interval, and so that's what we're looking at with this alpha over two. Now, do  you notice a problem with this equation? It has p in it. We don't know p, that's  the whole point. We're trying to guess p. P is the population proportion. We don't know this, however, we have a guess. Our guess is p hat. So that's exactly what we're going to use. We're going to use p hat plus or minus this point on a normal distribution, z times the square root of p hat times one minus p hat over n. Now,  when we estimate a standard deviation of a statistic, in this case sigma p hat,  now instead of calling it a standard deviation, we change its name a little bit, we 

call it a standard error, basically saying, hey, look, I know that the real standard  deviation would involve the number p, however, I don't know the number p, so  I'm going to have to guess at this standard deviation, because I'm guessing at it, it's no longer the real standard deviation, it's a guess of the standard deviation,  and we call that guess a standard error. So this number here is the standard  error of p hat. Remember, the standard deviation of p hat would be p times one  minus p over n with the square root over that, but because we don't know p, we  have to guess at p. We now have to guess it with p hat. It's now a standard  error. Whew, throwing a lot of terminology, lot of concepts at you. Let's go ahead and work through an example, and hopefully that can sort of solidify things, so  you think that people are more likely to rent a bike on a clear or cloudy day  compared to a misty or rainy or snowy day. Now, your data is a sample of 731  days, 63% of your sample is clear or cloudy, so we're going to build a 90%  confidence interval for the true proportion of clear or cloudy days where your  company operates. So, okay, let's think about this first. I have an idea where I  want to build a 90% confidence interval. In other words, 90% of the time, if I  were to take 100 samples, 90 of those samples would contain the truth, and I'd  expect 10 of them not to. So, again, we're going to do this process in theory over and over and over again. In reality, we get one sample, we're just relying on the  procedure producing a good interval more often than not. So, okay, so we want  a 90% confidence interval. Well, if we want our confidence interval to be 90%  confident, then that means we're going to have a 10% error. Well, that 10% error doesn't always go below or always go above. It's split into two pieces, so we're  going to have half of it 5% is going to be below our interval. 5% is going to be  above our interval. The question is, what is the point Z alpha over two? What is  the point on a standard normal distribution where 5% of the data is in the tail?  That's the hard part, because we know for a 95% confidence interval that  number is close to two, it's 1.96 but what about a 90% confidence interval.  Again, we could go to our normal table to be able to look this up. If you were to  look in the middle of your normal table and try and find 5% you would see  something like 0.0505 and 0.0495 Ooh. So 5% 0.05 is right in the middle of  those two numbers, they're both the same distance away. Well, we could look at negative 1.64 or negative 1.65 I'd be okay with either of those numbers, but in  reality it's actually halfway in the middle of them, 1.645 So, if we were to take  our sample parameter, I'm sorry, our sample statistic, subtract off 1.645 and  multiply it by the standard deviation of our statistic that would be the lower  bound on our confidence interval, and then we do the reverse, where we add  the 1.645 times the standard deviation, and that would be the upper bound. So,  let's do that. So, we're going to take our sample statistic, 63% 63% of the days  in our sample are clear or cloudy. Then we add and subtract 1.645 where again  that 1.645 comes from the fact that we want to be 90% confident. Now we've  seen two numbers here: 95% confident is 1.96 90% confident is 1.645 The more

confident we are, the bigger this number is going to be, which is going to make  our interval wider and wider and wider, which makes sense, right? You're more  confident in a wider interval, like for example, I'm 100% confident that the  average age of people listening to this lecture right now is between zero and  100 Guarantee it, you know, I'm 100% confident that the average age of  everyone listening to this lecture is between zero and 100 Now, if you asked me  to be a little less confident, maybe I'd say I'm only 90% confident that the  average age of everyone listening to this lecture is between 18 and 35 Well,  again, why am I only 90% confident? Well, because I shrunk my interval down.  There's a chance I could be wrong when I have a really small interval. When I  have a really large interval, I have a better chance of being right. So notice how  we went from 95% to 90% and that number, that z changed. It's going to make  our intervals a little smaller, so we have one. I'm sorry, we have 0.63 plus or  minus 1.645 times the square root of 0.63 times one minus 0.63 divided by 731  if you were to do that calculation, you would have 0.63 plus or minus 1.645  times 0.018 or in other words, 0.63 plus or minus 0.03. One way of thinking  about that is essentially we think that 63% of the days are clear or cloudy, plus  or minus 3% or another way of thinking about it is. We think that between 60 and 66% of our days are clear or cloudy, and we're 95% confident about that  interval. So hopefully that gives you an idea of a real world example of this. In  fact, again, if it happens to be election season at the time that you're listening to  this video, this is exactly what they do with polls that you see on TV, where they  sit there and tell you again, candidate A has a 32% chance of winning, plus or  minus 5% This is exactly what they're doing, literally this exact calculation, and  so now you know how to do those calculations, all right. let's summarize the  confidence interval for p hat with a confidence coefficient of one minus alpha.  Basically, an error rate of alpha is the following equation: you take p hat, then  you add and subtract the margin of error, where that margin of error has two  pieces, the piece from the standard normal distribution is just a number times  the standard error of p hat, our guess of the standard deviation, which for us is  the square root of p hat times one minus p hat over n, because remember if we  were to guess a standard deviation of a statistic, it's now no longer called a  standard deviation. It's now called a standard error. I know we've thrown a lot at  you with this lecture, but now you can hopefully see all the foundation that we've been building. We talked about probabilities, we talked about the normal  distribution, we talked about sampling distribution, specifically of P hat, and  they've all laid the groundwork for us to be able to build this confidence interval,  so that when we get a guess, we can take that guess and add a margin of error  to it. That is the end of this lecture, and I look forward to seeing you in the next  one.



Последнее изменение: понедельник, 22 июня 2026, 08:33