welcome. Let's continue into our next section of the course, where we're going  to be talking about distributions of statistics that we get from data. Now, we've  been talking about a lot of distributions lately. Over the last two sections, we  talked about distributions of discrete data, we talked about distributions of  continuous data. Now we're talking about distributions of statistics, but let's  remember what a statistic is. Let's have a little bit of a review. Remember this  slide, we talked about four different things: the population, the parameter, the  sample, and the statistic. Remember, the population is the set of all objects or all individuals that you're interested in finding information out about. Of course, in a  real-world scenario, we rarely get a chance to talk to the entire population.  Instead, we have to sample. A sample is a subset of the population. This is  where we actually gather our information. The goal of any good sampling  technique is to have a sample that represents the population well. Now, if the  sample is where we're actually obtaining information, then what we actually  measure from that sample is what we call a statistic. Remember, again, a  statistic is some kind of measure that's computed from a sample. Now, this  statistic is what we use to estimate a parameter. A parameter is a measure  computed from a population, so you can think about the idea like this. I am  interested in knowing the average height of all Americans. All Americans would  be the population. The average height would be the parameter from that  population. I can't actually talk to all Americans, so I'll take a sample of them.  From that sample, I can calculate that sample's average height. That average  height from my sample is going to be my best guess for that average height  parameter from the population. So everything works together, remember  population and parameter both have p's. Sample and statistic both have s's. It's  the easiest way to remember. Statistics describe samples, parameters describe  populations. But again, why do we care so much about this idea of a statistic  and a parameter? Remember, a statistic is a guess of a parameter. More  formally, we call those statistics point estimators. They're point estimators  because we actually have a single number estimate from a population, that's  what we're looking for. Now, different population parameters have different  corresponding sample statistics. For example, the population parameter mu,  that's again that u with the little tail on the front, that would again be some  average or mean, that's the population mean, the point estimator, the statistic for the population mean, would just be the sample mean. If you have to guess at a  population average, your sample's average is the best guess you have. When it  comes to variance, sigma squared, that again describes a population, that's the  population's variance again, we don't typically see this, so instead we're going to measure the variance of our sample, we call that s squared. Remember, also  we've talked about the idea of proportions, you have a proportion p, let's say the proportion of people that have brown hair, so you have the proportion p in your  population parameter, but again we don't get a chance to measure that, so we'll 

take a sample, and the proportion in our sample is p with a little caret over top of it, we call it p hat. Now, remember, samples are estimates, they don't represent  the entire population. Now we hope that they do a good job of estimating the  population, but they aren't the population again. That's the whole point. We can't talk to the whole population, we can talk to a sample, so that sample isn't the  whole population, it's just an estimate of it. Well, remember, then statistics,  which come from samples, that would mean that statistics are just estimates of  the parameters. They're not going to be exactly right, because they're again just  educated guesses. They're estimates, as you. Imagine with any kind of  estimation comes a chance of making errors, so let's talk about that. Let's  imagine you had a population that consisted of these 10 numbers: 1, 3, 5, 5, 7,  9, 4, 6, 10, and 2, if you wanted to know the average of these 10 numbers, that  would be 5.2 that would be a population average, that is a parameter. Now,  obviously, you wouldn't have a population that's probably the size of 10, but it  just gives us a good example to look at. Let's take a sample again, imagining I  can't talk to the entire population. I'm going to take a sample, a sample randomly of four of those 10 observations. That random sample consists of the numbers  1, 10, 6, and 9. If I were to take the sample average from this sample, that would be six and a half, that is a statistic that 6.5 is a statistic, because it was  calculated from a sample that 5.2 on the upper right hand side, that's a  parameter because it was calculated from a population, but let's imagine you got a different sample, instead of sample one. Let's say you wanted to take another  sample. Let's call it sample two. Again, you randomly select four out of the 10  observations, and you get the numbers 1, 3, 2, and 5. Well, this sample average this statistic from this sample, this sample average is 2.75 Well, wait a minute,  both of those samples produced statistics that were wrong. They were both  trying to estimate the same number, weren't they? Remember, the population  average is 5.2 The sample is trying to estimate the population, so both of those  estimates, 6.5 and 2.75 are wrong. that gets us to this idea about sampling  error. Sampling error occurs when there is a difference between a sample point  estimate and the corresponding population parameter. In all honesty, sampling  error happens all the time. It's just a matter of how big that sampling error is. So, again, let's take a look here. For that first sample, we could say the sampling  error was our guess, 6.5 minus the truth, 5.2 our sampling error would be 1.3 I  was off by 1.3 for sample two. On the other hand, our guess was 2.75 Our truth  was 5.2 Our sampling error from the second sample is negative 2.45 So again,  here are our errors. However, let's think about this a little bit more realistically. In a realistic scenario, what do you actually know? This, this is typically all we  know. Think about it. If you knew the whole population, and you knew the  population mean, mu, why would you ever take a sample? The reason you're  taking a sample is because you don't know the population, which means you  don't know the truth. You don't know the population average, in this case, was 

5.2 In fact, very rarely do you ever actually get more than one sample. Typically,  you only get one sample, so all you see is a sample here that has an average of 6.5 and that's your best guess. In fact, because we don't know the true value of  

5.2 We don't even know how wrong we are. We don't know if this 6.5 is a good  guess or it's a bad guess. So we have all these possible guesses that we could  get from all of our possible samples, and then let's start thinking about this a  little more. If sample statistics like we have here, like the sample mean, let's  imagine they had a predictable pattern. If they had a predictable pattern, then  the errors would have a predictable pattern as well, and if the errors have a  predictable pattern, even if we don't know the true value, we could say  something potentially about how right or how wrong we are. That's what we're  talking about in this section of the course. We're talking about sampling error,  and we're talking about distributions of these sample statistics. What if we were  to look at the distribution of all possible sample means? What would that look  like? Well, let's go ahead and summarize. Sample statistics are just single  number estimates. We call these point estimates, and they're point estimates of  some kind of population parameter. Now, sampling error occurs when there's a  difference between a sample point estimate and the corresponding population  parameter. Unfortunately, we don't get a chance to measure this sampling error  all too often because we don't know the true population parameter. However, if  sample statistics like the sample mean that we used in our example had a  predictable pattern, then the errors would have a typical and predictable pattern  as well, and that's what we're going to look forward to in our next lecture. But for now, that is the end of this lecture, and I look forward to seeing you next time.



पिछ्ला सुधार: सोमवार, 22 जून 2026, 8:21 AM