Welcome. Let's continue our conversation around randomness in data by now  exploring some basic probability rules. Last time we talked about the idea of  probability, what it is, but there are some rules, and when it comes to being able  to calculate and being able to manipulate probabilities, so for example, you do  not always know all of the sample point probabilities in an event. However, there are some basic probability relationships that you can still use to calculate the  probability that an event occurs. Here are the four most common ones: the  complement of an event, the union of two events, the intersection of two events,  and mutually exclusive events. So let's talk about each one of these in this  lecture. The complement of an event, the complement of an event is defined to  be the event consisting of all the sample points that are not in a specific event,  so for example, let's imagine you have the chart on the right-hand side. The big  rectangle is all the possible events that could occur. Event A is the circle you see there on the left-hand side, okay? The complement of A is basically everything  else that's well, not A. We basically say if we have event A, event A's  complement is all the other events that aren't A. The complement of A is typically denoted with A with a little c in the upper right hand side, or sometimes A with a  bar over top. I don't like the idea of A with a bar over the top, because that's how we have defined averages earlier. So, we'll go more with A with a little C in the  upper right hand side. Awesome, so if you want to think about rolling a dice, and you want to think about the event A as rolling a 1, then the complement of A  would be rolling anything other than a 1, it would be rolling a 2 3 4 5, or 6. All  right, let's continue our thoughts around the idea of these rules. What about the  union of two events? Well, the union of an event A and an event B is the event  containing all the possible points that are in A or B, or both. So, again, you have  the sample space of all possible things that could happen. Event A is defined  here by the left-hand circle. Event B would be defined here by the right-hand  circle, and the union of A and B is basically all of the outcomes that contain both  A or B, so we typically denote this union as A with a U in the middle, B, that  would imply the idea of A union B, and we can see a drawing of that here on the  right-hand side, A and B essentially now become one combined shape with each other. However, we also have what we call the intersection of two events. The  intersection of an event A and an event B is the event containing all of the  sample points that are in both A and B, so it's not just A and it's not just B. They  have to be contained in both A and B for it to be considered part of the  intersection, that is, the darkly shaded region in between the two events A and B that you see here on the right hand side, that is the outcomes that both of these  events share. So, when looking, for example, at a union of two events, let's  again imagine rolling a dice. Let's imagine event A is rolling a 1 or a 2. Let's  imagine Event B is rolling a 2 or a 3. The union between those would be rolling a 1, 2, or 3, because it would contain all the possible outcomes of both events.  You can think of the union as an or statement, what are the chances of you 

getting A or B? Well, the chances of you getting A or B would be the chances of  A rolling a 1 or a 2, or B rolling a 2 or a 3, or in other words, rolling a 1 or a 2 or  a 3. However, the intersection would be where they are both happening. This is  an and statement again. If event A was rolling a 1 or a 2 on a dice, and event B  was rolling a 2 or a 3 on a dice, then the intersection of those would be rolling a  2, because that is the only thing they both have in common, we denote this  

intersection by an upside-down U, so we say A with an upside-down U, B, that  would imply A intersect B. Again, you can see it as the shaded region on the  right-hand side with these three concepts, the concept of complement, the  concept of union, and the concept of intersection. We now have what we call the addition law. The addition law provides a way to compute the union of events A  and B, so the probability of the event A union B, or if you want to think about it,  the probability of event A or B. When you hear the words or, think of the term  union. When you hear the words and think of the term intersection. So, how do  we calculate the probability of A or B? In other words, the union of A and B. Well, that would be the probability of A plus the probability of B minus any of the  overlap that A and B have, so minus A intersect B, which would make sense,  right? If I wanted to calculate the union of the events A and B. I'm going to get all of the events in A, that is the circle of A. I'm going to get all of the events in B,  that's the circle of B. But if you notice, I've over counted that little piece in the  middle that they share. I've over counted that intersection, so if I wanted to  figure out this shaded area here that you see, I want all the events in A, I want  all the events in B. However, since I over counted the events that they both  share, I'm going to have to subtract off the intersection, so that I'm just left with  counting those specific outcomes only once, so hopefully that makes a little bit  of intuitive sense, but we'll see it again in an example here shortly. Now, what if  A and B didn't have an intersection? What if A and B can't happen at the same  time, that would be what we would refer to as mutually exclusive events. Two  events are mutually exclusive if they have no sample points in common. In other words, there is no intersection between them. This also means that the events  cannot both occur. If one event occurs the other cannot, and so again, let's  imagine you had something like the flip of a coin. A heads and tails would be  mutually exclusive events. I can't get both a heads and a tails with one flip of a  coin. If I were to roll a dice, rolling a 1 or rolling a 6 would be mutually exclusive  events. I can't roll a 1 and a 6 at the same time. However, when we came up  with our example earlier of rolling a dice, event A was rolling a 1 or a 2, event B  was rolling a 2 or a 3. Well, both of those events can actually occur at the same  time. If I roll a 2, I've satisfied event A, 1 or 2, and if I roll a 2, I would satisfy  event B, which is a 2 or a 3. And so that being the case, those actually do have  an intersection. They are not mutually exclusive, but if we had two mutually  exclusive events, then with you wanted to know the probability of a union B, or  the probability of A or B, you would just add the two probabilities together again, 

the probability of you flipping a heads, plus the probability of you flipping a  tails, .5 plus .5 is 1. You, there are no intersections between them, you can't get  both a heads and a tail, so you do not have to worry about adjusting this  addition. Let's see this through an example, it may make it a little bit easier. So,  let's look at an example from our data set here. What I have is the weather on  the far left hand column. The weather on the far left hand column is represented  by three categories: clear or cloudy, misty, rain or snow, so. The other columns  that you see across the top represent different seasons, spring, summer, fall,  and winter. Notice the last column on the right-hand side gives me the total  number of users across each one of those different types of weather patterns,  so the first row is going to give me the total number of clear or cloudy day or  clear or cloudy users. Essentially, how many users used our bike rental on a  clear or cloudy day. The bottom row gives me the total across all of the seasons, so that first column is going to tell me the total number of users in the spring, of  course, the bottom row and farthest right-hand column is going to tell me the  total number of users altogether. So, again, just to be able to look at this chart,  make sure we understand it. What we're saying is that there are 626,986 users.  users who use our service in the spring on a clear or cloudy day. There are  799,443 users who used our service on a summer day that was clear or cloudy,  and so on and so forth all the way down to there were only 3,739 users who  used our service on a rainy or snowy day in the winter so this shows us again all of our users across weather and across different seasons, so what is the  probability that a random customer uses the bike service in the fall and it was  raining or snowing? Okay, well, we can again go back to our table, so let's take  a look at the fall column and the rainy or snowy row. Okay, well, of the little over  3 million total users, 19,000 1600, I'm sorry, 616 times a user used our service in the fall when it was rainy or snowy, but what's the probability. Well, again, let's  look at how many times we actually saw the event happen. So 19,616 times we  saw this event happen out of the 3,292,679 different events we saw, or different  user events that we saw, would mean that only a probability of.06, or in other  words, .6%, so .6% of the time a customer uses the bike service in fall and rainy or snowy, so if you were to randomly close your eyes and we had all of our  users in a bucket and you were to randomly select a user out of the bucket,  we're saying there's a point 6% chance that that specific user that you selected  used the bike service in the fall when it was rainy and or snowing. Awesome,  let's change one word in this question. What is the probability that a random  customer uses the bike service in the fall or when it was raining or snowing, big  difference here. We said it had to be both fall and rain or snow. This is an  intersection here, it's fall or rain or snow. This is a union. Let's take a look. So,  let's see, here we have all of our fall users in the highlighted fall column, there's  841,613 users in the fall, remember of those 841,613 users, that is how many  people used it in the fall. Out of the 3.2 million total users we have. Let's look at 

rain or snow. Let's look at that highlighted row. Let's again look at that whole  row. That whole row says that there's 37,869 users who used our service in the  rain or the snow, that again is the sum of that whole row, that 37,869 is the  addition of 3507, 11,007 19,616, 3739 that fall. Column, the 841,613 that is the  sum of all of the fall numbers, 519,487 plus 302,510 plus 19,616 So, if we  wanted to look at how many users used our service in the fall or used our  service in the rain or the snow? We could just add the 841,613 users in the fall,  plus the 37,869 users in the rain or the snow, but wait, hold on. Where did those  things cross? We counted the 19,616 users twice. That 841,000 users in the fall  had the 19,616 users that we see highlighted twice. The 37,869 users in the rain or the snow had the 19,616 users in there as well, so we've counted those users twice. So we, if we want to know how many users used our service in the fall or  the rain or the snow, we should subtract off those 19,616 because we sort of  double counted them, and that's exactly what we do, so we're going to take that  37,869 users, those are the total number of users in the rain or the snow divided by the total number of users overall. Then we're going to add the fall users, the  841,613 fall users, divided by again the total number of users. However, again in adding those first two numbers, we accidentally counted 19,616 people twice, so we need to subtract off 19,616 people, because we don't want to double count  them, so that would leave us with 859,866 users that used our service in the fall, or when it was raining or snowing, if we divide that by the total number of users,  3,292,679 we would get a probability of .261 or basically a little bit over 26%  There's a 26% chance you could randomly select a customer and they used the  bike service in the fall or it was rainy or snowy, so again be very careful with the  addition law, because we can accidentally count people twice, that going back  was the same idea as the intersection that we saw when we wanted to add  together events A and B. Well, we've counted that intersection twice. So, the  addition law has us subtract off that intersection as long as our events are not  mutually exclusive. These events right here are not mutually exclusive because  people were able to both use our service in the fall and the rain or snow, and so  we accidentally counted those 19,616 people twice, but after subtracting that off  we get the right number for our probability. So let's summarize the complement  of an event A is defined to be all of the events that are not in A. The union of an  event A with an event B, denoted A union B, with that little U in the middle, is the  event containing all sample points that are in A or B, or both. We can compute  this union with the addition law again. Just be careful, you don't accidentally  count that intersection twice. The intersection of an event A and B, denoted A  intersect B with an upside-down U, is the event containing all the sample points  that are in both A and B. Again, when you think of probabilities, unions think of  or, what's the probability of A or B? Intersections think of and, what's the  probability of A and B. Now, lastly, two events are mutually exclusive if the  events have no sample points in common. In other words, if they don't intersect, 

then they both can't happen at the same time, wow, I know we've ramped up the math in this class really quickly here with this lecture, so hopefully you  understand a little bit about what we're talking about when it comes to  probabilities, when it comes to unions and intersections, as well as how to be  able to use those together to get the addition law, so. So that is the end of this  lecture, and I look forward to seeing you in the next.



Last modified: Monday, June 8, 2026, 8:31 AM