Welcome in this section of the course. What we are going to do is build upon all  the previous sections and talk about the idea of estimating not just a single point but an entire interval with our data. If you think about it, a point estimator is  probably not going to be the exact value of the true thing that it's estimating,  right? If you had to guess at something, your guess would hopefully be close,  but it's probably not going to be exactly right. And because it's not going to be  exactly right, there is going to be a little bit of wiggle room to our estimates, a  margin of error, if you will. So, how can we give a good margin of error to our  point estimates? That's the whole idea of what the last section of the course  really laid the foundation for. If you think about it, we now have from the last  section of the course sampling distributions. What do sampling distributions do  to help us, sampling distributions are distributions of point estimates. So, if I  were to give a point estimate, like a sample mean, to estimate a population  parameter, then, because I know how sample means look, I can actually give  some idea of a margin of error, some wiggle room, if you will, around my actual  point, so that's what lays the foundation for what we call interval estimates. An  interval estimate can be computed by adding and subtracting this margin of  error to the point estimates, and the reason why we do this. The reason we have interval estimates is to provide information about how close the point estimate is to the value of the parameter. You've probably seen things like this before. Let's  imagine you've ever seen election coverage on TV. A lot of times you'll see  they'll say candidate A has 32% of the vote plus or minus 5% least that would be the idea in a poll. So all right, they're trying to poll people to see who's going to  vote for candidate A, but they know since this is not the actual election yet that  they're just polling people to get a guess of who's going to vote for candidate A.  Then what they do is they say, well, I know I have just a sample, and this is just  a guess, so that guess is going to have some wiggle room, and so that's what  that plus or minus 5% in our example is. So again, if I said candidate A is going  to get 32% of the vote, plus or minus 5% That 5% would be the margin of error.  So, again, what we're trying to do is just provide a little bit of information about  how close we think the point estimate is to the value of the parameter. Now, this  does not mean that your interval estimates will actually always contain the truth.  Again, these are just guesses, and so we can't be 100% sure, because we don't know the truth. If we knew what the population parameter was in a real-world  example, then why would we ever take a sample and estimate anything right  that makes sense. So, if I didn't actually know what it was, this true population  parameter, again, let's just use an election as an example. If I truly do not know  what the proportion of people are they're going to vote for a certain candidate,  then the whole idea is I'm trying to provide a little bit of an estimation, but that  estimation will come with a chance of being wrong every time we guess at  something, we could be wrong, and so that's sort of the idea of what we're trying to do. So, what do we call these things formally? Formally, we call them 

confidence intervals. Confidence intervals are interval estimates where we say  we have a certain level of confidence of in the interval itself. For example, let's  say we are 95% confident that the population average daily number of total  users of the bike rental company is between 4000 and 5000 people, so again  that would make some notion of sense. If you want, you could think about it as  we are 95% confident that the population average daily total users of the bike  rental company is 4500 people, plus or minus 500 That's another way of thinking about it being between 4000 - 5000, but what does this 95% confident even  mean? It seems. Like a loaded statement, so what is 95% confident? 95%  confident basically means this: if we were to take many, many, many, many,  many samples, now all of the same size, we want to keep it fair. If we were to  take many, many samples. Each one of those samples is going to produce a  different confidence interval. Well, why is that? Well, each sample, if you  remember from our last series of lectures, provided a different point estimate,  right? So, if I were to take a sample, and I were to take something like a sample  average from that sample, then there's no guarantee that the next sample I take  is going to have the same sample average. So, again, if I were to look at heights of individuals, if I were to look at heights of individuals that I take a sample of  100 people, the average height of those 100 people is probably not going to be  the average height of a different sample of 100 people, so each sample is going  to produce a different point estimate. Well, if it produces a different point  estimate, then it's going to produce a different range, it's going to produce a  different set of values for that range of what I think something like the average  height would be, or in our example here, the average total daily users for our  bike rental company. Okay, so if each sample produces a different point  estimate, and therefore each sample produces a different confidence interval,  the idea of 95% confidence means that 95% of the confidence intervals we take  will contain the truth, or in other words, 95% of the time our confidence intervals  would contain the true parameter of interest, the number we don't know, but  we're saying that 95% of the time our intervals would contain that truth. Well,  how do we know that if we don't know what the real answer is? Ah, this goes  back to that idea of sampling distributions. We know what sample means, for  example, look like in terms of their distribution, so we have an idea of how many sample means are going to be close to the truth and how many are going to be  far away from the truth, and we're going to use that information to help us build  these confidence intervals. Let me try and show you an example of this visually.  Let's imagine this is your population. Okay, so your population is normally  distributed. You may or may not know that the population average is that  number there, mu. That is something you do not know in the real world. I do not  know what the true average is of the entire population, but I'm going to try and  guess it. So, let's take a single sample and guess that single sample we're going to call x bar. Specifically, this is our first sample, so let's call it x bar one. Notice 

how my guess, my point estimate x bar one is a little bit higher, it's to the right of  the truth mu. Now, again, in the real world, I wouldn't know where mu was. All I  can see is x bar one. However, the arrows on either side of x bar one represent  the margin of error, represent that calculation we're going to learn how to do in  this section of the course, that wiggle room that we're going to add to our point  estimates to be able to give us a range of confidence. Notice how the true  population parameter mu is contained inside of that interval, if you think about  the ends of the interval being the tips of the arrows, so the true value of mu is  contained inside of this confidence interval. Now let's imagine we take another  sample, we'll call this other sample x bar two, you'll notice with this sample the  average in this sample is a little bit lower than the average in the first sample,  and it also happens to be lower than the true population parameter mu, but  notice when we put the same margin of error on this sample, so X bar two has  that same wiggle room, has that same margin of error. When we take that and  put that on this sample, if you notice it also contains the truth, the true  population parameter mu. Again, think about this interval going from one tip of  the arrow to the other Tip of the arrow, where X bar two is right in the middle, so  we've taken two samples, both of them happen to contain the truth. Let's take a  look at this third sample here. Oh, wait, what do we see? This third sample, X  bar three, has an average that is really high above mu, way to the right of mu,  so far to the right that even though we put some wiggle room around x bar 3, x,  bar three is our point estimate, our interval estimate are those arrows on either  side of x bar three, even if we go from tip to tip of the arrows around x bar three,  it does not contain the true value of mu, and so this interval estimate missed,  and the thing is, we can keep doing this over and over and over and over again.  Here's another example. X bar 4x, bar four is again another point estimate from  another sample of the same size. We put the same margin of error around it,  and we can see, okay, this one contains the truth. And if we were to keep doing  this repeatedly, the whole idea of 95% confidence is that 95% of the time, 95%  of the samples, 95% of the confidence intervals you see are going to contain the truth, mu, but there are going to be some 5% that are not going to contain the  truth, for example, like X bar three that you see here on the screen. Now I know  what you're thinking a lot of times we only get one sample, so when thinking  about this idea where we have 95% of the time our confidence intervals would  contain the truth. That's great. If I could take 100 different samples, then I know  that 95 of them are right, but I only get one sample. How do I know if my one  sample contains the truth? You don't, but you're putting your confidence in the  procedure that you did to get that one sample and the interval you calculated for that one sample. It's kind of like flipping a coin. Let's imagine you had an unfair  coin. 90% of the time it lands on heads, 10% of the time it lands on tails. Well,  you know that if you flip the coin many, many, many times, it's going to land on  heads more than it's going to land on tails. But, however, I'm only giving you one

flip now. Yes, with that one flip, it could land either on heads or on tails, but  wouldn't you still bet on it landing on heads? Of course, you would, because you know it lands on heads more often than it does land on tails, and it'd be the  same idea here. Yes, you only get one sample. Are we sure that our one sample is actually the one of the 95% of samples that contains the truth, no, we're not  sure about that at all. However, however, we're relying on the fact that we know  if we were to do this over and over and over and over and over again, 95% of  the time it would be right, so if that's the case, we can trust that one sample.  Another way I like to think about this is going to a free throw line, playing  basketball. Let's imagine you had two people that you could send to the free  throw line, and you had a significant amount of money or value on them making  the shot, one of those people is a 70% free throw shooter. One of those people  is a 95% free throw shooter. Now you only get one shot, so each one of them  could make that shot, but Who would you rather take the free throw. Well, the  person who's got a greater chance of making it, and that's the same idea here.  You only get one sample, but that one sample, along with its corresponding  interval, was calculated in a way that 95% of the time it's going to be right. So  that's what you're really putting your confidence in. You're putting your  confidence in the actual procedure itself, in the process that we're going to learn  here in this section of the course, and in these corresponding lectures to be able to help us figure out whether or not we actually. Get a good interval now. One  quick caveat, just because we say we are 95% confident that the population  average is going to fall inside of our interval, that is not the same thing as saying there's a 95% chance the population parameter falls inside our confidence  interval. Well, hold on a second. Doesn't that sound like the same thing you said earlier, that 95% of the time our intervals would contain the truth? Isn't that  basically saying that there's a 95% probability that the parameter is going to fall  inside of our interval. No, no, no. Careful, notice the difference here on how  many intervals we're talking about with this statement compared to how many  intervals we're talking about with this statement. The true statement is saying  that if we were to produce many, many, many, many, many, many intervals, 95% of them would be right. This statement here is saying our one interval is going to be right 95 or there's a 95% chance that the truth will fall into this one interval.  No, that's not the case. Careful here. What we're saying is that the population  parameter is the thing that's moving. It's not the interval, is what is moving. So,  again, we're not saying there's a 95% chance the population parameter would  come into our confidence interval. We're saying that there's a 95% chance of all  of our intervals actually containing the population parameter. So, what are you  putting your probability on? I'm saying we need to put the probability on the  intervals, not on the population parameter, that thing we're estimating, that mu,  that average daily number of total users, or that average height of Americans,  whatever example you'd like, that is not moving, that number is fixed. What's 

moving is our interval, because each sample we take is going to move our  interval, so that being the case, what we're saying is we're 95% confident in the  interval in the actual process that we're taking, we're not saying there's a 95%  chance the number we're estimating is going to just so happen to fall into our  one single interval. I know it's a little bit confusing, it's sort of a lot of, you know,  mathematical mumbo jumbo, but it is important. It really is. What are you putting  the randomness on? Are you saying that the number you're estimating is  random and it's moving around, or are you saying that the intervals that you  have in the samples you take are random, because that's what we're saying we  should apply the randomness to the random, the randomness is in our samples,  we take random samples, those random samples are going to produce different  sample means, those different sample means are going to be closer or further  away from the truth. I'm saying that 95% of the time the way we're going to  calculate them is that they are close enough to the truth based on that margin of error. That's the idea. All right, let's summarize. Confidence intervals are interval  estimates where we say we have a certain level of confidence in our interval, but what does confidence mean? Confidence implies if we were to take many  samples, all of the same size, that each produced different confidence intervals,  then 95% of these confidence intervals would contain the true parameter. Now,  again, which one you have, you do not know. Maybe you're part of the 95% that  actually contain the truth. Maybe your one sample is part of the 5% that doesn't  contain the truth. You do this process over and over and over in your career,  you're going to be right more often than you're wrong, but you will be wrong  sometimes. I guess the only real nice advantage to it is we never know when  we're right or when we're wrong, because we don't know the true value of mu.  We're just guessing now. One thing you have to be careful of with confidence,  confidence is not the chance the population parameter falls inside of our one  confidence interval, so you have to be careful. I know we've talked a lot about  this idea of confidence, and it is a harder concept to wrap your mind around.  Definitely take some time, go back and watch this lecture, but. Ponder this idea  of confidence a little bit. It takes a little bit, and it takes some thinking to be able  to fully grasp. That's okay, but that is the end of this lecture. And I look forward  to seeing you in the next one.



Остання зміна: понеділок 22 червня 2026 08:31 AM