All right, let's finish our conversations around continuous data by continuing  where we left off in our last lecture. In our last lecture, we finished off with the  idea of the empirical rule. However, we realized at the very end that the  empirical rule works really well when you're looking at specific points around  standard deviations, however, if we have things that aren't on those exact  points, it's a little bit harder to figure out. That is where standardized scores  come in. A random variable having a normal distribution with a mean of zero and a standard deviation of one is said to have what we call a standard or a  standardized normal probability distribution. The reason this is important is that  people have basically taken the empirical rule to the extreme, and they have  done it not just for one standard deviation, two standard deviations, and three  standard deviations away from the mean, but for a standard normal distribution,  one with a mean of zero and standard deviation of one, they have taken the  empirical rule all the way down to multiple decimal places, so 0.01 standard  deviations from the mean, 0.02 standard deviations from the mean, and so on  and so forth, all the way up to 3.49 standard deviations from the mean, and so  they have put all of these empirical rule calculations into a table that we call a  probability table, and so what we can do is we can use these previously done  calculations, or in all honesty, you can use software like Excel or a variety of  other softwares to do this as well, or calculator to do this, but we can use these  previous calculations in these tables to be able to calculate the same kinds of  questions we had last time around the normal distribution, but with a lot more  fine-tuned detail. Now, the only downside is, is these calculations have been  done for a single normal distribution, again a normal distribution with a mean of  zero and a standard deviation of one. I know what you might be thinking, well, if  my normal distribution doesn't have a mean of zero or a standard deviation of  one. What in the world am I to do? The nice part about the mathematics of the  normal distribution is that all normal distributions can be converted into standard normal distributions, so that we can calculate these probabilities so much easier, and that's what we're going to be talking about here in this lecture, first of all,  let's talk about the idea of a standard normal table. So, on the course website,  you'll see an actual full standard normal table. This is just a snippet of it. Here,  the standard normal table is an extension, again, of the empirical rule, where the area underneath the standard normal curve to the left of any point is calculated  all the way up to two decimal places. Let me show you. You'll notice that there  are rows and that there are columns. These things together form the two  decimal place number. So let's take a look at the row 0.8 and the column 0.03  What that is basically telling you is the probability on a standard normal  distribution that's to the left of the point 0.83 is a probability of 0.7967 Some  people see things better with numbers, but some people see things better with  pictures. Let's take a look at a picture. So, again, we're going to take a look at  the point 0.83 Well, what is that point 0.83 standard deviations above the mean?

If we were to look at the point that is 0.83 standard deviations above the mean,  and we were to look at all of the data that exists below that point, so if we were  to shade in the entire normal curve below the point that is 0.83 standard  deviations above the mean, then the normal distribution will be roughly 80% of  your data, or more specifically, 0.7967 So, hopefully that makes sense. So,  again, we can do the same thing with any point up to two decimal places on a  standard normal curve, for example, if we were to look at one half of a standard  deviation above the mean, we would look at 0.5 for the row, 0.00 for the column, and we would see a probability of 0.6915, so again we're trying to look at  different standard deviations and how far or how much data exists below each of these points. Now you may be thinking, what if I don't want the data below a  point? What if I want the data above a point? Well, the nice part is, remember  some of those rules of probability. So now all of the class is going to start  building on itself. So those rules of probability, well, one of the rules of  probability is that all possible things have to add up to a probability of one. So if I know that 0.7967 or 79.67% of my data is below the point 0.83 Then, if I were to  look above that point, or to the right of that point, it would be one minus that  number, or in other words, what we're saying would be that 79.67% of your data  is below this point. That means that 20.33% of your data must be above this  point, right? If all of your data consists of 100% then pick any point if you know  the probability being below that point, you also know the probability being above that point. It's just one minus whatever you had before. Now, again, this works  great if we have a standard normal distribution, but what if we don't have a  standard normal distribution? What if our data doesn't follow that? If our data  doesn't follow a standard normal distribution, we're going to have to convert it  again. Luckily, all normal distributions can be converted into standard normal  distributions to make these probabilities under the curve easier to calculate, so  let's imagine you had a normal distribution, like you see here. It's a little more  spread out than a standard normal, and it's not centered at zero, it's centered at  10. Well, what we can do essentially is we're going to shift the distribution to be  centered at zero, and then we're either going to shrink or expand the  distribution, so it has a standard deviation of one. Once we do that, what we can basically do is basically follow this premise that all normal distributions have the  same relationship when it comes to standard deviations and means, so if I want  to know this shaded area on the upper distribution, if I could find that same point on a standard normal distribution, then these two shaded areas would be the  same, so if I know that, for example, 20% of my data is above the point 0.83 on  a standard normal distribution. Well, then wherever that point is on my normal  distribution is also going to have 20% of their data above that point. Now, how  do we convert these other normal distributions to standardized normal  distributions. What we have to do is we use what we call a z score, or a  standardization score, that allows us to convert any single point on a normal 

distribution with a mean of zero and a standard deviation of sigma to the  corresponding point on the standard normal distribution, so in other words, pick  any point on any normal distribution, subtract that normal distribution's mean, so take the point x, subtract the mean, divide by the standard deviation, and that's  going to give you the point on the standard normal distribution. Let me show you through the idea of an example again. Now, let's assume the daily number of  total users follows a normal distribution. So, let's imagine when we have this for  our bike data example. So, if we assume that the daily number of total users  follows a normal distribution, and we know that the average daily number of total users is 4504 with a standard deviation of 1937 Then we can ask ourselves this  question, What's the probability that any random day has more than 6000 total  users. Okay, again, a very reasonable question. We know 6000 is more rare  than 4504 because 4504 is right in the middle of our data, but we want to know  just how rare this is. So, what's the probability that any random day, when a  typical day is 4504 users, what's the probability that any random day has more  than 6000 users? Again, unfortunately, there's not a normal probability table for  the normal distribution with a mean of 4504 and a standard deviation of 1937  but there is a normal distribution table for a standard normal distribution. So,  let's convert our data to that. So, if we were to look at 6000 we were to subtract  our mean of 4504 we were to divide our standard deviation, 1937 then we would get a value of 0.77 or in other words, 6000 that value is 0.77 standard deviations above the mean of 4504 so if you were to look at 0.77 of 1937 that would be the  difference between 6000 and 4504 that's all we're saying, and so that point 6000 on that normal distribution is the same point as 0.77 on the standard normal  distribution. So let's look up our standard normal table. Well, if we were to try  and figure out what's the area under the curve, what's the probability of being  below this point on a normal distribution that's standardized? We would look at  the row 0.7 the column 0.07 that would give us the point 0.77 and we'll see the  probability is 0.7794 0h,7794 Oh, neat. Okay, so in other words, there is a  77.94% chance that we're going to have less than 6000 total users in a day, but  that's not what the question was. The question was, what's the probability that  any random day has more than 6000 total users. Well, if the table told us that  there's a 77.94% chance of being less than 6000 then we can just subtract that  from one and we would know there's a 22.06% chance of being greater than  6000 or a probability of 0.2206 See how that works. I didn't need the empirical  rule. In fact, I couldn't use the empirical rule because 6000 wasn't a nice number on my distribution of 4504 and 1937 but I was able to still figure out what's the  probability of being above this number again. Visually, what we're doing is we're  basically saying the point 6000 on the normal distribution with 4504 as the mean and 1937 as the standard deviation is the same point as 0.77 on a standard  normal distribution with a mean of zero and a standard deviation of one, which  means the shaded area in the tail is the exact same 0.2206 Now I know this 

chart that you see here, this picture, those shaded areas don't look like they're  the exact same. You'll have to pardon me. This is not an exact normal  distribution, but with those two exact normal distributions, they would be the  exact same. Let's work through another example to make sure we've got this.  Now assume that the daily number of total users follows a normal distribution.  The average daily number of total users again is 4504 standard deviation of  1937 Well, instead of asking what's the probability that we have more or less  than a certain number of users in a day, you may ask sort of the reverse  question, What is the number of daily users that would be in the bottom 10% of  daily users? So, if we wanted to know, you know, how bad could bad get on a  day of daily users? If I know on average I expect 4504 Great, but I want to know  what is the worst 10% of days look like. So, how many total users did I have on  the bottom 10% of days? So we would sort of work this problem in reverse  instead of looking up x, what we're now doing is looking up a probability. We're  trying to look up a z value, so we would start from the center of the table and  work our way outwards. Basically, what point on a normal distribution, a  standard normal distribution, has only 10% of the data below it. Well, if we were  to look at the point negative 1.28 again, we're looking at the row negative 1.2 the column 0.08 We would see a value that's really close to 10%. 0.1003 It's as  close as we can get with our table. In other words, that 10% of the data below  are on a standard normal distribution. 10% of your data is below negative 1.28  but what's that on our distribution? Well, we just have to work backwards if we  know negative 1.28 is the point on the distribution we want to look at, we need  to find x. So I know x minus the mean divided by the standard deviation has to  give me negative 1.28 Through some playing around with algebra, we can figure out that x is 2,024.64 2,024.64 or in other words, on the worst 10% of days,  we're going to have roughly 2000 users or less that actually use our bike rental  service that day. Again, what we're basically saying is that point 2,024.64  1,024.64 is the same point as negative 1.28 on the standard normal distribution,  so hopefully this has allowed you to see that no matter what normal distribution  you have, you can answer any kind of probability question on it because of the  fact that we have this nice consistent shape across all normal distributions, so  let's summarize a random variable having a normal distribution with a mean of  zero and a standard deviation of one is said to be called a standard normal  distribution, all normal distributions can actually be converted into this standard  normal distribution, which is so helpful for us, because we actually know what  kind of probabilities exist on a standard normal distribution through the help of  standard normal probability tables, so that means that we can answer, we can  ask and answer any kind of probability question on any kind of normal  distribution, you have your data, you have a certain normal distribution, you can  say, hey, what's the probability above or below a certain point, all we have to do  is convert it to a standard normal distribution. First, and then we can answer that

exact question. The beauty of math, and why the normal distribution is so  important to statistics, and really just beyond statistics, because it can answer  so many questions for us. But that is the end of this lecture. That is the end of  this section, and I look forward to seeing you in the next. 



இறுதியாக மாற்றியது: திங்கள், 15 ஜூன் 2026, 9:46 AM