Video Transcript: Distribution of Discrete Data - Part 2
Let's continue our discussion around distributions of discrete data by now moving into something we call the expected value, as well as the variance. The expected value, also known as the mean or average of a random variable, is a measure of its central location. Wait, hold on a second. Did I just say average or mean? Well, yes, this is actually just like the mean that we had talked about previously. However, you can think about the expected value as sort of a weighted mean, an average where not every single point has the same weight inside of the calculation. Let me show you. Let's take a look at that equation. So, the expected value of a random variable X is denoted as a capital E followed by parentheses with X in the middle. Sometimes people refer to this as the Greek letter mu. It kind of looks like a U with a little tail on the front that is the Greek letter mu, like mu, but take a look at the actual calculation on the far right hand side, there's that summation symbol again. So I'm going to sum up all of the values of X from i equals one to n, so again I'm summing up x1 x2 x3 x4 and so on and so forth across all of the values of X. Well, so far, so good. So far, it looks just like the regular mean. However, instead of dividing that by n, I'm multiplying that here by some kind of probability. Remember, we denote probability with a capital P, so we're going to say it's the probability of X again being our random variable taking on a specific value Xi, that looks a little bit weird. Think about that probability of X, capital X equals little Xi is basically the probability that you have your random variable equaling any of the possible values that it has. So, for example, our TV example we worked on on our last lecture, we would say the probability X equals zero and give its probability, then the probability X equals 1, and so on and so forth, but like I said, think about the expected value as a weighted mean or a weighted average, and that probability serves as the actual weight. Well, again, what do we know about probabilities under a classical estimation of probability? Everything would have the same probability, and if we had n observations, then we would have a probability of one over n, and so, okay, if we have the probability of one over n for every single observation, then wait a minute, we have our regular average calculation. This looks just like the average that we had before. So again, we have an average here, it's just typically we have a weighted average, where again not every observation has an equal probability of happening, and since every observation doesn't have an equal probability of happening, we don't want to just sit there and say the average is every observation sort of inequal. Let me show you an example again. Let's go back to our discrete probability example from our last lecture. Let's let X be the number of TVs sold at a small department store in one day. Now, again, X can only take the values of 0, 1, 2, 3, 4, and 5, and if all I told you was that information, that the values of X could be zero through 5, but I told you, what's the average number of TVs that someone would sell in a day. What you would probably do is you would just take the average of 0, 1, 2, 3, 4, and 5, so you would add those six numbers together, and then you would divide by 6,
because each number has an equal chance of happening. However, let's take a look at the actual data. Remember this data table that we calculated from the last time. On the far left-hand column, we have the number of TVs sold, again zero through 5. The second column is the number of days that actually happens, the frequency. The fourth column is the respective probability that that number of TVs sold actually happened, or remember we called this the relative frequency. So, let's take a look again. On the first row, there's a 25% chance that we sell zero TVs, there's a 23% chance we sell one TV, a 19% chance we sell two TVs, a 12% chance we sell three TVs, a 14% chance we sell four TVs, and a 7% chance we sell five TVs. Well, one thing I can tell you, it's not an equal probability on selling each one of those different numbers of TVs. Right, I don't have an equal chance of selling zero TVs than I do of selling five TVs, and so if I were to just take an average of the numbers 0, 1, 2, 3, 4, and 5, I wouldn't be accounting for the fact that not each number has an equal chance, that's what the expected value does. So the expected value takes into account not only the value of the variable 0, 1, 2, 3, 4, 5, but the probability that that value takes. So let's calculate that X times the probability that X equals X for that very first row. Again, we're getting that from the right hand side of this equation. Okay, so I would take the value of zero number of TVs sold and I would multiply it by the probability of 0.25 the probability that happens, that would give me the number zero again. How did I get this number? I multiplied the number in the first column, TV sold, by the probability of that actually happening, the fourth column .25. Okay, well, let's take a look at the second row. The second row, first column would be one. There'd be one TV sold. Well, what's the probability of me selling one TV? Let's go to the fourth column, .23. If I multiply those two numbers together, I would get one times .23, which is just .23, huh. Well, let's take a look at the third row. The third row would be two, a number of TVs sold, a probability of .19. Well, that would give us, if we multiplied those two numbers together, a probability of 0.38 I've also filled in the last three rows here for you, where we have three TVs sold, four TVs sold, and five TVs sold. If we were to take a look at that last column, the X times the probability that big X equals little x, we would see the probabilities multiplied by the actual values of the variable, again we would see the right hand side of this equation. So, if we were to sum up all of those values together, if you were to sum up, add together all of the values in that last column, zero + 0.23 + 0.38 + 0.36 + 0.56 + 0.35 you would get that highlighted number at the very bottom 1.88 Well, what does that highlighted number mean? Well, on average we expect to sell 1.88 TVs per day. Wow, that's helpful. That's great. Now, again, why is it not just the average of the numbers 0, 1, 2, 3, 4, 5, If you were to take the average of those six numbers, you would get something a little bit over two, which again might make sense if you just took the average of those numbers and everything was equal. However, we know that's not the case. There's a bigger chance of me selling only zero or one TV than
there is of me selling four or five TVs, so that means that I shouldn't count four or five, as highly as I count zero or one, right. So, again, if I want to know how many TVs I expect to sell in a day, what is the typical number of TVs to sell per
day? Then, instead of just taking the average, like we learned about it a few sections ago, we're going to take the expected value, or again, it's kind of like the average, but we're going to weight that average on the likelihood of that value actually happening, so it's more likely to get a zero or a one, so it's going to be weighed heavier than a four or a five, and so that means our number's a little bit lower than if we just took the original average, so again we expect to sell 1.88 TVs per day on average. Wonderful, so we have taken this average, and we sort of came up with a weighted version of it, a better way of summarizing what that typical value. Would look like now, for example, in our bike data set, when we took the average temperature. Well, every day had an equal chance of happening, so just taking the average temperature was completely fine. But here, the number of TVs sold does not have an equal chance of happening, so we can't just take the original average, we need to take its expected value of course. If we have changed our way of calculating average, we can also change the way we calculate variance. Again, we can do very much the same thing. Instead of just taking the original variance equation, we are going to replace that one over n minus one, here again, with the probability that X equals Xi. So, again, we're going to take a weighted version of what we saw previously. So, again, what is variance? Variance is a measure of spread, it's a measure of variability, and it's defined here by this equation again, var of X or the variance of X we typically denote as this little symbol squared, that little symbol is the symbol sigma, the Greek letter sigma squared, and again it's the same calculation we saw previously, we take the sum, the sum of what, again the big summation symbol from i equals one to n, we sum up all of the values of Xi minus the mean squared, so we take Xi minus its expected value, remember that was what mu was, the expected value, so we take X minus off the expected value, and we square that number, because remember, what was variance? Variance was trying to get at sort of an average of squared distances from the mean, or here the average of the squared distances from the expected value, so everything all the way up through that square term looks the exact same as we had for the original variance equation. The only difference now is instead of dividing by n minus one, we're going to divide by the probability that X equals Xi. Again, we're taking a weighted version of this, of course. If we could do this for variance, we could also do it for standard deviation. If you remember from our previous section, standard deviation is just the square root of variance. All right, let's work all this out. All right, let's work it out, sort of one by one, so we can see this calculation and see this thing. So I take the same original four columns that I showed you for the expected value calculation, the number of TVs sold, the frequency, the probability that X equals X, as well as X times the probability that
big X equals little x. So, again, remember the fourth column you see here is the first column multiplied by the third column. If we take that fourth column, add all the values together. That's our expected value of 1.88 So, when we'd use this calculation, where we do X minus mu, X minus that u with a little tail in the front, X minus mu, that is the same thing as saying X minus the expected value. So, if we take X, that would be zero, and we subtract off the expected value, 1.88 Then X minus the expected value would be - 1.88 Do you see how we got that? So we're going to take that first column, and we're just going to subtract off the number 1.88 from every single value, so again that first column zero minus 1.88 is - 1.88 That second row one in the first column minus 1.88 is - .88. Third row first column value two minus 1.88 Let's look to that fifth column, take the value of 0.12 and so on and so forth. You can see what all of the original values of X, 0, 1, 2, 3, 4, 5, are minus their expected value of 1.88 All right. The next column that's blank is the X minus the expected value squared. So we're going to take that previous column, for example, - 1.88 and we're going to square that number. Remember what squared means, we're just going to multiply it by itself, so - 1.88 times - 1.88 gives us 3.53 and again we're going to do that for all of the values in the X minus mu column, so - 0.88 times itself. Would give you 0.77 if you take 0.12 and multiply it by itself, if you square it, it would be 0.01 and so on and so forth. All right, lot of calculations here, but now we have all of our squared distances from the mean, squared distances from the expected value, so with those squared distances from the expected value. Now all we're going to do is just multiply that 1, 2, 3, 4, 5, sixth, column by the 1, 2, third, column, so we're going to multiply that X minus mu squared times the probability of us getting that value, so okay, so X minus mu squared in the first row is 3.53 we're going to multiply that 3.53 by the probability that we got that value 0.25 if we do that, we would get the value 0.883 So, again, How did we get that? We multiplied all within the first row the 0.25 in the third column by the 3.53 in the sixth column, and that gave us 0.883 in the last column. Do the same thing all the way down. If we take all of the values in the third column, multiply them by all of the values in the sixth column, we're going to get all of these values here. So, again, let's look at the second row, probability of 0.23 times 0.77 that would give you 0.177. Third row, probability of 0.19 multiplied by a value of 0.01 would give you 0.002 and so on and so forth all the way down for the last three rows. I invite you, after this lecture is done, to make sure you can get the calculation for rows three, four, and five in this table. We've discussed in detail rows 0, 1 and 2, but I want to make sure you get a chance to practice, so make sure you understand how we got all the values in the rows three, four, and five. All right, now we take that last column and remember, what are we supposed to do when we calculate these expected values in these variances? We're going to sum up all of the values that you see here, so we're going to take 0.883 we're going to add 0.177 we're going to add 0.002 all the way down to 0.681 We add all those
numbers in the last column together, we're going to get the value of 2.522 or in other words, the variance of daily sales is 2.522 TVs squared. Now there's those squared terms again for variance. Remember, we don't like squared units too much, so if we take the square root of 2.522 you would get the standard deviation of daily sales is 1.588 TVs. Awesome. So, now let's take a look at this whole problem. So, let X be the number of TVs sold at a small department store in one day, where again X can only take the values 0, 1, 2, 3, 4, 5, If I asked you, what's the typical day look like in terms of number of TVs sold, and how spread out is that, you would answer, well, I expect to sell 1.88 TVs a day, that's from the expected value, and its spread is 1.588 TVs, that's its standard deviation. So now you can summarize center and spread when not every observation has an equal probability. See how much we've grown since taking a look at the first average and first standard deviation. When we first learned about averages and standard deviations and variances, we just treated every observation as equal. Then we learned about probability. Then we learned about distributions, specifically that distributions don't have to have equal probabilities for every category. And now from that, we can get an even better guess of what the expected value, or what the spread is going to be on a distribution, so let's summarize the expected value or mean of a random variable is just a measure of its central tendency or location, and that expected value looks just like the average that we had before. We sum up all of the values of X except. For each one of those, we weight them, we multiply them by the probability that they occur. So, for each value, multiply it by the probability it happens, and then do the same thing for every one of the values before summing it up. Same idea for the variance. The variance of a random variable is a measure of its variability and spread. Again, same kind of calculation we had before, we're looking at really an average of the squared distances each point is from its expected value, but instead of every point getting an equal weight, you take the actual distance squared, so X minus mu squared, multiply it by the probability you even saw that value, and then sum up all of those calculations to get a better estimate of spread. Wow, I know there's a lot of calculations we looked at in this lecture. I definitely invite you to go back and look through those calculations again, especially for rows three, four, and five, to make sure you understand all the calculations that we did, but that is the end of this lecture, and I look forward to seeing you next time.