Welcome to the next section of the course. In the next section of the course, we  have to lay the foundation for a lot of the stuff we're going to cover from this  point going forward, and that foundation deals with the idea of probability and  the idea of randomness. So, that's what we're going to be covering over the next few lectures. Again, some of the concepts may seem a little out of place for right now, but you have to trust me that this is building and laying the foundation for  all the later sections that we're going to be looking at. So, let's look at the idea of probability and the notion of risk when it comes to statistics. First, we have to  talk about the notion of randomness, really the idea of chance. So, what do we  mean when we say something is random? Well, what we mean when we say  something is random is that an outcome would be random if we know the  particular outcomes that something could have, but are unsure of which of those outcomes is about to happen, so people throughout history have tried to  measure patterns in randomness and answer the question, what would happen  if I did this many, many, many times, which again goes to a lot of the things that  you've probably thought about in the past or have had to do in previous math  classes, let's imagine you tried flipping a coin. Well, each flip of the coin is  completely random. You know what could happen, but you're unsure of the  exact specific outcome. So, if your coin is fair, again, that would be evenly  weighted. Then many flips of the coin would result in getting approximately half  of the time heads and half of the time tails, or 50% of the time heads and 50% of the time tails. Well, this leads us into the notion of what we call probability, the  probability that an event happens is a numerical measure of the likelihood of  that event's occurrence, so it's a number between zero and one, where the  closer you are to zero, the less likely that event is going to happen, and the  closer you are to one, the more likely the event is going to happen. Of course,  right in the middle, just like our flip of the coin, the events are equally likely to  happen as compared to unhappen or not happen. I guess that would probably  be better way of saying it. So, when it came to flipping a coin, we had an equal  chance of heads and tails. If we were looking at the probability of me getting a  heads and the coin was unfairly weighted towards tails, then my probability of  heads would be a little bit lower. That would be the idea. So, again, when  looking at a notion of probability, these are numbers between zero and one.  Now, a lot of times people will substitute out percentages for probabilities. Be  very careful, percentages are numbers between zero and 100 probabilities are  numbers between zero and one. So, a probability of point five is the same thing  as saying a percentage of 50% but just be mindful of that with some of the  calculations we have to do. We're doing the calculations, assuming we have  probabilities, numbers between zero and one. Awesome, so we have this idea  that the probability that an event happens is the numerical measure of how likely that event is going to happen. Well, again, if we're going to be talking about  randomness, though, we need to have some idea of the total number of events 

that could happen. That is what we call the sample space. The sample space is  the collection of all, and I mean all possible outcomes in a random process, for  something as simple as a flip of a coin, we would have two possible outcomes,  heads or tails. For example, the roll of a dice would have six possible outcomes, the numbers one through six, but when you look at a sample space, if you were  

to sum up all of the probabilities for that random process across all of the  outcomes in the sample space, it has to sum to one again. If you have a  probability of point five of getting heads and a probability of point five of getting  tails, well, since those are the only two possible outcomes, then they have to  sum up all those probabilities to be one. You're not going to have some small  probability of some third random outcome happening, because again, heads  and tails are the only two possible outcomes we're allowing. So now an event is  a collection of one or more outcomes from a random process. Again, this event  cannot be predicted with certainty. It's some notion of random. We have a  possible set of outcomes, an event we're not sure which events are going to  happen, even though we know what could happen, and so each event has a  chance of not happening, so the probability of an event, A for example,  happening is denoted by the big capital P(A). A, so example, when flipping a fair coin, what is the probability of landing on heads? When rolling a fair dice, what  is the probability of rolling a six? Those would be examples of events, and those  events have a probability, so the probability of a fair coin landing on heads  would be .5. The probability of the event of rolling a six on a fair six-sided dice  would be 1/6 or .167 Now at if a random process actually consists of a  sequence of multiple steps, let's just say we have k steps, which there are  certain number of outcomes per step, then the total number of outcomes is the  multiplication of these, so for example, let's imagine step one has n outcomes,  little n1 outcomes. Step two has little n2 outcomes. Step three has little n3  outcomes, and so on and so forth. Well, the total number of outcomes would be  the multiplication of all of those little n's. Now I know that you might be thinking,  well, what do you mean by this? Let me give you an example. It may help. Let's  look at what we call a two-step random process, and in all honesty, multi-step  random processes are really easily visualized with what we call tree diagrams,  like you see here. So, let's imagine your two-step random process is you flipping a coin two times. Okay, well, let's go back here. Step one of me flipping a coin  has two possible outcomes. Step two of me flipping a coin also has two possible outcomes. Again, we flip a coin twice, so if I have two outcomes from the first  step and two outcomes from the second step, then that means I have four two  times two total possible outcomes that can come from me flipping a coin two  times again. Those four outcomes are listed here, so imagine we flip a coin  again. We can get either heads or tails. So let's imagine we get heads. Okay, so  we got heads on our first flip, but again we flip another coin with that second flip. We can again get heads or tails, so that means we could get a heads followed 

by a heads, a heads followed by a tails, a tails followed by a heads, and a tails  followed by a tails. Notice we have four possible outcomes. Again, the first step  had two possible outcomes, the second step had two possible outcomes. The  two flips of the coin would lead to four possible outcomes in all. I invite you to try this exact same thing with flipping a coin once and rolling a six sided die once.  Step one would be flipping a coin. Step two would be rolling a six sided die. Step one has two outcomes. Step two has six outcomes. So we would expect to have 12 total outcomes that we could have from flipping a coin and rolling a dice. I  invite you to try on your own and figure out what the tree diagram would look  like. What would those 12 outcomes be from flipping a coin and rolling a dice?  So, how do we assign probabilities to these outcomes? Well, the probabilities of  an event occurring, remember, have to be between zero and one, and we know  that the sum of probabilities across all the events must equal one, but how do  we again assign these actual probabilities? Well, there are three typical methods for assigning probabilities to events, we call them the classical method, the  relative frequency method and the subjective method. So, let's go through each  one of them. The classical method of assigning probabilities assumes that all  events have equally likely outcomes. This goes back to our coin flip, so if we had a flip of a coin, there's two possible outcomes, so the probability is going to be  the same across those two outcomes, so if an experiment has n possible  outcomes, then each outcome gets a probability of one over n. Again, for our  coin example, there were two outcomes, each outcome gets a probability of one half or .5. When it comes to rolling a dice, we have six possible outcomes: 1 2 3  4 5, or 6. Again, we're going to assume the classical method. We're going to  assume that each outcome is equally likely, so that means that each of those  outcomes has a one in six, or a 1/6 chance of actually happening. This is  probably the most predominant way that people look at when assigning  probabilities. Another way, though, is what we call the relative frequency  method. The relative frequency method of assigning probabilities assigns  probabilities based on experimentation or historical data, for example, you don't  believe that the dice I gave you is actually fairly weighted, so you ask me to roll  it 100 times. Once I roll it 100 times, let's imagine you get the following: 10 times I got a value of 1. 25 times, I got a value of 2. 42 times, I got a value of 3. 7  times, I got a value of 4. 10 times, I got a value of 5, and 6 times I got a value of  6. Well, with those 100 rolls of the dice, those experimental probabilities in the  third column would be what we think the probabilities of the actual role in the  first column are. Notice how a three has a probability of point four two. Again,  these experimental probabilities in the right hand column are definitely not equal to each other, so that being the case, you may not believe that I actually have a  fair dice anymore. Again, if you assume fair, each dice would have a probability  of 1/6 or .167 Here, that's not the case with our 100 rolls. Now let's imagine,  though, that circumstances might change rapidly in the event you're trying to 

build probabilities for, so you don't want to just use equal probabilities, and you  don't want to just use historical data. Well, you can use both a combination of  historical data as well as experience and intuition about how likely an event will  occur, and that is called the subjective method. Now, the best probability  methods that are out there are typically a combination of subjective, as well as  classical and relative frequencies. An example of someone using a subjective  method would be someone like a meteorologist. They can look back at the  historical data and say, you know, historically this would imply that there's a 20% chance, or a probability of .2, of it raining today. But based on my experience, on what I've seen being a meteorologist in this area, or based on my intuition, I  think the probability is actually a little bit higher. I'm going to say the probability is a .4 instead. That would be an example of the subjective method. So, let's  summarize real quick. An outcome is random if we know the particular outcomes that something could have, but are unsure of which of those outcomes is about  to happen. Now, the probability that an event happens is a numerical measure  of likelihood of that event's occurrence, and we typically get probabilities through one of the three methods we talked about, the classical method, which assigns  an equal probability to every outcome, the relative frequency method, which  uses historical data and experimentation to figure out the probabilities of each  outcome, and the subjective method, which uses more experience and intuition  to assign those probabilities. Now, an event is a collection of one or more  outcomes from a process whose result cannot be predicted with certainty, again  has a probability that we're trying to calculate. Now, let's finish off this lecture by  talking about the law of large numbers, something you may have heard about in  the past. So, let's talk about what that is, because this can be something that's  rather confusing to a lot of people when it comes to randomness. So again,  when flipping a coin, each flip is completely random, you're not quite sure if  we're going to get heads or tails. Now, just because, though, that we're not sure  what's going to happen in the short run with a single flip, in the long run the flip  of a coin is relatively predictable in the long run. I expect to get about half of the  time heads and about half of the time tails, but each individual flip I'm not sure if  it's going to be heads or tails. So, let's imagine I tossed a coin 500 times and I  recorded the proportion of heads as I went along early on. The proportion of  times I got heads, as you can see, changed drastically. It started at zero after my first flip, because I started with tails, and then it progressively changed, and you  can see the more and more flips that I did of the coin, the closer and closer and  closer it got to .5 in terms of the proportion of times heads was achieved, but  again it took 500 flips of this coin for it to really start getting close to a .5 in terms of the proportion of times heads occurred, so you have to be very, very careful.  The law of large numbers would state that as the number of trials increases in  the long run, and I mean the long run, the proportion for a certain event gets  closer and closer and closer to a single value, so again, if we flip a coin 

hundreds upon hundreds upon hundreds of times, the proportionate times it  lands on heads or tails, it's going to get closer and closer and closer to 50%  Again, though, this is in the long run, it takes hundreds of flips for this to be able  to actually happen, so you have to be very, very careful of this. Now, as it gets  closer and closer and closer to that single value, we would consider that value  the probability of the event. This would be an example of using experimentation  that relative frequency method to be able to answer the question of what the  probability should be. However, there is a myth around this, a myth of short run  predictability. Chance behavior is unpredictable in the short run, but very  predictable in the long run. Unfortunately, this is counterintuitive to most people.  Let me give you an example. Which of the following outcomes of flipping a fair  coin, so it should be fair four times, is more likely to happen: me getting a head  followed by two tails, followed by a head, or me getting four straight heads.  Which of those outcomes is more probable? You might have a tendency of  saying the one on the left is more probable, but in actuality they are the same.  Let's take a look. Let's take a look at our tree diagram, so again I'm flipping a  coin four times. So let's imagine I flip a coin, I get heads, then I get tails, then I  get tails, then I get heads. That's one of the many possible outcomes that could  occur. Specifically, there are 16 possible outcomes. I'm only showing you the top eight. So, again, flipping a coin and getting heads, then tails, then tails, then  heads has a specific outcome. Now, I'm not saying getting two heads and two  tails, I'm saying getting that exact outcome: first heads, second tails, third tails,  fourth heads. Well, that actually has the same number of occurrences as getting  four straight heads, flip heads, flip heads, flip heads, flip heads. Again, that may  be counterintuitive to a lot of people. The same kind of myth applies in sports,  we see it all the time. This is the myth of the hot hand. Again, chance behavior is unpredictable in the short run, but predictable in the long run. Let's imagine a  player in basketball makes four straight shots, as compared to making one, then misses two, then makes one again. It has the same idea of the coins. Let's  imagine each time they took a shot, they had the same probability of making it.  Well, in four straight shots, they made it, as compared to they made it, they  missed twice, and they made it. That would have the exact same chance. Of  actually occurring, but again, in sports we don't think this, we think, oh my  goodness, that player is making all their shots, they must have a hot hand,  quickly feed them more, and then inevitably, as you do that, things revert back to their natural long-term predictability, and eventually the hot hand goes, “cold”.  This all revolves around the idea of the mythical law of averages. The law of  large numbers is a real thing. The law of averages actually isn't. You've probably heard people refer to the law of averages. There is no actual law of averages,  it's a myth again. There's no law of averages, because chance behavior is  unpredictable in the short run, but very predictable in the long run. So, let's  imagine you've probably heard a situation like this before. A couple has four 

kids, they're all boys, so you're probably thinking to yourself, with their fifth kid,  you know what they must be due to have a girl. However, that's not the case.  There's still a 50/50 chance that that fifth kid will be a girl again. Chance  behavior is unpredictable in the short run. It's only predictable in the long run.  So, let's imagine there actually is a 50/50 chance of boys and girls in terms of a  pregnancy. Over hundreds of pregnancies, we would expect to see about 50%  boys, 50% girls, but over four or five pregnancies, you could have a lot of  variability, you could have four boys, five boys, and it's still not going to drive that fifth child to be more likely to be a girl. So, again, there is no such thing as the  law of averages. Unfortunately, this is counterintuitive to most people. The law of averages doesn't exist, things will even out in the long run, but chances of  outcomes do not change, and so, since those chances do not change, it's not  like you're more likely to get a girl, or the player is more likely to be able to make a shot, or the coin is more likely to be able to land on tails now that I flipped four  straight heads. That's not the case. Each individual outcome still has the same  chances. It's just if you observe those chances over hundreds or 1000s of trials,  it will eventually equal itself out. So, let's summarize. Chance behavior is  unpredictable in the short run, but it is predictable in the long run. The law of  large numbers is a true law. It states that the number of independent trials  increases in the long run, the proportion for a certain event gets closer and  closer and closer to the actual probability of that event, the law of averages,  though, does not exist. Things even out in the long run, but chances of  outcomes do not change. So, in the short run, things are still unpredictable.  Sorry. So, that is the end of this lecture. I look forward to seeing you in the next. 



Modifié le: lundi 8 juin 2026, 08:29