Video Transcript: Randomness in Data - Part 3
Welcome. Let's finish off this last section on randomness and data by talking about a couple more things around probability. Specifically, today we're going to talk about the idea of conditional probabilities. So, what are conditional probabilities? Well, the probability of an event given that another event has already occurred is referred to as the conditional or sometimes joint probability. The conditional probability of A given that B has already occurred is denoted by the capital P(A\B), so it's basically read as the probability of A given that's what the vertical line would be, B. So, again, the probability of A given B. Well, how do we calculate this probability of A given B? Or, in other words, again, assume B has already happened. Now that B has happened, what's the probability of A? Well, the way we would calculate that is, we would first calculate the intersection of A and B. Remember that from our last lecture, the probability of A intersect B is the probability that both A and B happen at the same time, if we divide that number by the probability of B, that would give us the probability of A given that B has already happened. From this conditional probability, we can get what we call the multiplication law, the multiplication law provides a way to actually compute the probability of the intersection of two events, as long as you know the conditional probabilities. So, for example, if you wanted to know the probability of A intersect B, that is just the probability of A given B times the probability of B. Fun math, it's also the probability of B given a times the probability of A. So, how do we get this? Let's look at that top equation, and let's go back to this equation here. So, here we have the probability of a given B equals the probability of A intersect B divided by the probability of B. If we were to take that probability of B on the lower right hand side, multiply that over to the left hand side, that would give us this top equation, the probability of a given B times the probability of b would just give us the probability of A intersect B. Now, how did we get that bottom equation? Well, that bottom equation basically uses the premise that the probability of A intersect B is the exact same as the probability of B intersect A. The intersection of two events are going to have the same probability, no matter what order you list them in. A intersect B, B intersect A. So, if instead we go back to this equation and write the probability of B given A, so switch the A and the B on the left hand side, so B\A, then we would have on the right hand side the probability of B intersect a divided by the probability of A. Well, the probability of B intersect A is the same as the probability of A intersect B, so again we can work on the same math and actually get the same exact equation, so as long as you know one of the conditional probabilities, A given B or B given A, then you can figure out the probability of the intersection. Now there's one little extension here to the multiplication law, it's called independent events. If the probability of an event A is not changed by the existence of event B, then these two events are what we call independent. In other words, it doesn't matter if B happens, the probability of a given B is just equal to the probability of A, and that would make intuitive sense, right? If I told
you I don't care about B, B has no influence on A, B is completely independent of A, then me telling you whether or not B happens is not going to actually change the probability of A, so the probability of A given B is just well, the probability of A to begin with. It's like the probability of you getting a good grade on a test is probably completely independent of the number of squirrels you see on the way into work, and so. That being the case, I could tell you the number of squirrels you see, and it has no bearing whatsoever on the grade you're going to get on that test, unless you, I guess, really like seeing squirrels. So that being the case, those would be independent events. The probability of A does not depend on the probability, or does not depend on B happening, so the probability of A stays the same, of course. If the events are independent, the reverse is also true. The probability of B given A is just the probability of B. By the same logic, if A and B are independent of each other, me telling you A has already happened is not going to change your thoughts on the probability of B. So, how does this play into the multiplication law? Well, the multiplication law again provides a way to compute the probability of the intersection. If, and, boy, this is a big if, if the events are independent of each other. If A and B are independent events, then the probability of their intersection is actually just the multiplication of the probability of A with the probability of B. Again, let's review that real quick. So the multiplication law traditionally is the probability of A intersect B is equal to the probability of A given B times the probability of B. However, if A and B are independent, that probability of A given B, that thing you see right in the middle of the equation on the top, the probability of A given B is just the probability of A if the events are independent, which then changes that top equation to the probability of A intersect B is just the probability of A times the probability of B. It doesn't matter. Again, this only works for independent things. Let me give you an example in a tree diagram. Tree diagrams, again, can help calculate probabilities in a series of independent events. Ah, wait, so when we had our tree diagram that we did two flips of a coin, those two flips were independent of each other, so if I wanted to know the probability of getting a heads two times the probability of getting a heads on the first flip and the probability of getting a heads on the second flip, that's an intersection, A heads on both flips, then I can just multiply those probabilities together, so let's take a look. If we look here on the tree diagram, we can see we flip a coin. There's a 50% chance, or a probability of .5, of getting heads and a probability of .5 getting tails. So let's imagine we get heads. Let's go up our tree diagram on the bottom. Then we flip the coin again. Again, there's a probability of .5 of heads and a probability of .5 of tails, so again, these two flips of the coin are completely independent of each other, so if you wanted to know what's the probability of flipping a coin and getting two heads, then you would take the probability of getting 1 heads .5 times it by the probability of getting the other heads .5, and you would get a probability of .25. In fact, if you go down this
entire tree diagram, you'll notice that there's an equal chance of you getting all of these different possible outcomes. If you flip a coin, the probability of you getting two heads in a row is .25. The probability of you getting a heads followed by a tails is .25. The probability of you getting a tails followed by a heads is .25, and the probability of you getting two tails is .25. This goes back to this idea that we talked about previously, when it came to the notion of a law of averages, and you would say, oh, a heads followed by a tails, that would be more likely than two heads or two tails. That's not actually the case. Flipping two heads in a row has the exact same probability of flipping a heads followed by flipping a tails wonderful, so let's take a look at conditional probabilities and the multiplication law using an example from our data set. So we're going to use the exact same example that we worked on in the last lecture, same rows here. Each row represents a different type of weather, with the far right-hand column representing all the people who used our bike service for that type of weather. Each column represents a different season: spring, summer, winter, fall, and the bottom row represents. The total number of users across those seasons. So, let's ask ourselves a question, what's the probability a random customer uses the bike service on a misty day, given it is winter. All right, so I'm basically telling you, first off, let's isolate out the people who use our service in winter, right? I told you, given winter, so what's the probability of something happen given that it is winter? So, given that it is winter, we only care about the people who used our service in the winter. I basically told you that's already happened, so we already know it's winter, okay. So that means instead of looking at the total 3.2 million people, we're looking at 471,348 So we have 471,348 people that used our service in winter. Well, how many of them used it on a misty day in winter? Well, there were 155,573 users in winter when it was misty. So, given that it's winter, so again we're dividing now by a smaller group of people, how many users did we have? Well, we had 155,573 right? Winter and Misty, but given that it was winter, so instead of dividing by everybody, we're dividing by just the number of winter people. If you take that number 155,573 divide by 471,348 you'll get something close to 1/3 or .33, or a 33% chance. So, there's a 33% chance a random customer in the winter will use our bike service on a misty day. Now we didn't use the multiplication law here, we just looked at the table and calculated that conditional probability directly, right? We just said, well, I know that it's given winter, so let's only look at winter people, okay? But what if we wanted to use the multiplication law? We can, we can look at the intersection of A and B, so we can look at the intersection of winter and misty day, that is, this 155,573 people out of the total number of people, 3,292,679 we can divide it by the probability of winter, which, remember, there are 471,348 people in winter, divided by the grand total again, but if you play around with the math and you actually divide these numbers, you'll get the exact same thing. We got above A probability of .33, so hopefully that helps you see a little bit about the idea of conditional
probabilities given that it is winter. What's the probability someone uses our service on a misty day? For us, it was a 33% chance. So, let's summarize the probability of an event given another event has occurred, is typically called a conditional probability. The probability of A given that B already happened. Now, the multiplication law uses these conditional probabilities to provide a way to compute the probability of the intersection of two events, so as long as you know one of the conditional probabilities between two events and their individual probabilities, then you can actually calculate the probability of the intersection of those two events. Now, if the probability of an event A is not changed at all by the existence of event B and vice versa, then the two events are what we refer to as independent. In other words, it doesn't matter if I tell you B, I could tell you B, and your thoughts on A do not change, it has the same probability, or I could tell you A in your thoughts on B do not change, it's the same probability. Then that would mean those two events are independent. Wow, we've covered a lot in this section on randomness. We talked a lot about probability, talked a lot about randomness and chance. We talked about the law of large numbers, the mythical law of averages. And then we started looking a lot at different ways we can manipulate probabilities, everything from looking at unions and adding probabilities of events together to intersections. Where we're looking at taking conditional probabilities and even multiplying events together. However, this is going to provide a wonderful foundation for our next two sections of the course, where we start talking about distributions of data. Distributions depend on probabilities, so without these probability fundamentals we're covering now, the next sections would be a lot harder, so I know it may seem a little out of place, but trust me, what we've talked about in these last three lectures is going to be very valuable for the next few sections. But that is the end of this lecture. That is the end of this section, and I look forward to seeing you next time.