Welcome. In this next section of the course, we're going to be talking about  distributions of continuous data. Let's remind ourselves of some things first.  Remember, we've talked previously about what a random variable is. A random  variable is a numerical description of the outcome of an experiment. Remember, it's basically a notion that we do not know exactly what is going to happen, but  we know the set of possible things that could happen. Now, this random variable can either be discrete or continuous. In the previous section of the course, we  talked about the discrete case, and remember, a discrete random variable may  assume either a finite number of values, for example, number of TVs sold at a  small department store, 0, 1, 2, 3, 4, or 5, or an infinite sequence of values,  number of people that could walk into the department store, 0, 1, 2, 3, etc. Well,  now we're going to move into the idea of a continuous random variable. A  continuous random variable may assume any numerical value in an interval or a collection of intervals. Remember, we've talked about this a little bit previously,  when talking about this previously, we said, what if we had the number like  distance to the store, where we could always find a value in between two other  values. For example, when looking at a discrete random variable, number of  people who walk into the store, 0, 1, 2. 3 We can't have any logical value, for  example, between two and three. There is no such thing as a half a person, so  2.5 people walking into the store wouldn't make any sense. However, a  continuous random variable, something like distance, I can always have a  distance that exists in between two other distances, so for example, if someone  lived between, if someone lived two miles away from a store and someone lived  three miles away from the store, I can say, well, it would make sense that  someone could actually live two and a half miles away from the store. Okay,  well, if someone can live two miles away from the store, or two and a half miles  away from the store, again, I can make a logical argument that someone could  live two and a quarter miles away from the store, and see how we can keep  doing this over and over and over again. Someone could live between two and  2.0001 miles away from the store, there's always a number smaller, and there's  always a value that makes sense in that smaller number in between two other  ones. That's the idea of a continuous random variable. Now, if you remember  from last time, when it came to discrete random variables, we were looking at  their distributions, and looking at their distributions, we were trying to calculate  things like probability. What's the probability that someone sells two TVs from  this small department store today? What's the probability that I roll a dice and  get a two? What's the probability of me flipping a coin four times and getting  heads each time? We were talking about probabilities of random variables.  Unfortunately, for continuous random variables, it is not possible to talk about  the probability of a random variable taking on a particular value again. I don't  have the ability here to be able to say, what's the probability of me selling  exactly two televisions? What's the probability of me rolling exactly three twos in 

10 rolls of the dice? Instead, when it comes to continuous random variables, we  talk about the probability of a random variable assuming a value inside of some  kind of interval. Let me show you what I mean visually. So, the probability of a  random variable assuming a value inside of a given interval, let's say between  any two numbers, let's call those numbers x1 and x2 So, if we were to graph out the distribution, you can see this bell curve shaped distribution here on the  screen, and I wanted to know, well, what's the probability that my random  variable that follows this bell-shaped curve is between the numbers x1 and x2  Well, that's what the shaded area is here on this graph, and in fact, if you  wanted to know the probability, it is the value of this shaded area. More  mathematically, we call this the area Under the curve, or the area under the  graph in between those two points, that graph, that bell-shaped curve you see,  is what we call the probability density function. Now, I know I've thrown a lot at  you here, so let me go ahead and try and help you connect these dots.  Remember when we were talking about discrete random variables, we could  ask something like, what's the probability of selling two TVs today, or what's the  probability of selling four TVs today, or you could ask, what's the probability of  selling two or three or four TVs today, all of those things were possible, however, because a random variable being continuous means that each value, and  there's an infinite number of those values, each value so infinitesimally small,  the probability of any one exact value is so hard to calculate, it's basically like  zero, so instead I can give you a range of values. So, what's the probability that,  for example, height of the people in this class is between five foot six and six  feet tall? Again, there can be so many different heights in between five foot six  inches and six feet tall, and so with that being the case, I am saying I can't  calculate the probability of being exactly a specific height, but I can calculate the range of those probabilities, I can calculate the probability of being in between  two numbers, that's the general idea, that's what we're doing with continuous  variables now. If it doesn't completely make sense yet, don't worry, that's what  this whole section of the course is about. This whole section of the course is  about continuous random variables and their distributions. So, we'll see many  examples of how we deal with this in the real world, that will hopefully connect  some examples and this idea for you, but again, this is the foundation of  everything we're going to be talking about over the next three lectures. Here are  some popular continuous distributions. You may have heard about them before.  The first one being the uniform distribution, we'll talk about this one here in this  lecture in a few moments. It basically is the idea that every value has an equal  chance of happening. It's kind of like the flip of a coin with the discrete random  variable, but here I have an infinite number of values, they just all have an equal shot. Another common distribution you may have heard about is what we refer  to as the exponential distribution. Something is growing exponentially, or  something is decreasing exponentially. And last, but not least, one of the most 

common distributions that we work with in statistics is what we refer to as the  normal distribution. Sometimes, more formally, people call it the Gaussian  distribution. I'll just prefer to call it the normal distribution. So, we'll go with that.  And, although you may not have ever known its name, you've probably seen it  before. It's that bell-shaped curve. So, in summary, a continuous random  variable may assume any numerical value in an interval or a collection of  intervals. Now, again, though it is not possible to talk about the probability of the  random variable assuming a particular value, instead we talk about probabilities  of intervals for these variables, so let's talk about that first distribution we  mentioned, the uniform distribution. It's a great initial distribution to start with,  and it's a great one to be able to see some basic examples to help solidify these concepts of ranges of values instead of an exact value, so a random variable  follows a uniform distribution whenever the probability is the proportional to the  interval's length. Whoa, okay, that first bullet point has a lot of big words there,  in, I'm not sure I understand all of them, so let's talk about it again. A random  variable follows a uniform distribution whenever the probability is proportional to  the interval's length. In other words, whenever every single possibility has an  equal chance of happening. That's the basic idea here. Every value has an  equal probability of occurring. Okay, so the probability density function for the  uniform distribution is given by this equation. So, if you wanted to know what  that actual box equation is, what box equation am I talking about the equation,  where you actually saw just this box, that's the distribution. If you want to know  what that equation is, that's what you see here. It is one over b minus a. Well,  wait, what are b and what are a? B and a are the ranges of the values of x, so  for example, if I have a uniform distribution between the values of zero and one,  then a would be zero, b would be one, and the function would be one over one  minus zero. Now that is the main part of the distribution. The rest of the values  of x take a value of zero. Anything outside this range has a zero chance of  actually happening. Let me give it to you in a real world example. Let's assume  that sales calls that go into a company are uniformly distributed by the years of  experience of the sales staff, so that everyone has the same chance of getting a call. So again, think about you have a variety of different people at a on a sales  staff, and they're each taking sales calls to not try and have any favoritism.  Every single person has an equal chance of getting a sales call. Okay, well, then that means that it doesn't matter what years of experience you have. You could  have anywhere from two years of experience to 12 years of experience, and  that's what we're going to say for our example, but every one of them has an  equal chance of getting a sales call, so if we were to think about this in terms of  that equation, we would say x, the number of years of experience, takes the  values between two and 12, and the chances of you getting any one of those  values of x between two and 12 years of experience. Basically, the chances of  you getting anyone in the sales department is equal. The years of experience 

has no bearing on your likelihood of getting a sales call, so our equation would  be one over b minus a, one over 12 minus two, that would give a value of 1/10  So, let me show you what this looks like visually. Okay, so I have one over 10 as the height of this rectangle. The length of this rectangle is between two and 12.  Oh, wait a minute. The length, then, if you think about it, is 10, right? What's 12  minus two? Well, 12 minus two is 10, so it's actually not surprising that the  height of this rectangle is 1/10 because I'm splitting that distance of 10 equally  across the entire range, so I have a height of 1/10 Let me go back. That is what I mean by this first bullet point. A random variable follows a uniform distribution  whenever the probability is proportional to the interval's length. Remember, in  other words, every value has an equal probability, so the length of this interval is 10. It's between two and 12, and every value has a probability that's proportional to that 1/10 hopefully that helps connect the dots a little bit, but how can we use  this? Well, you could answer some questions if you knew that this was true. If  you knew that sales calls go into a company and they're uniformly distributed by  the years of experience of the sales staff, then you could ask yourself a question like this: What is the probability a call is answered by an employee with 10 to 12  years of experience? Oh, well, that's just this shaded area of the rectangle, and  that's what we're talking about when it comes to an interval. I can't tell you the  exact probability that someone with 10 years of experience will answer the sales call. I can't tell you the exact probability of someone with 11 years of experience  answering the sales call, because someone could have 10.5 years of  experience or 10.8 years of experience. There's so many different values, but I  can tell you the probability that someone between 10 and 12 years of  experience answers that sales call. It's the highlighted area of the graph. So,  again, going back to what we saw previously, sorry, going to jump back a few  slides here when we were looking here at this interval. We said that if we looked  at an interval between two values and looked at the area under the graph, that is the probability, and that's all we're doing right here. What is the area under that  graph? Well, the probability that you get between 10 to 12 years experience by  the person you call if sales are evenly distributed, if they're uniformly distributed  across all salespeople. Well, that just means that it's going to be essentially a  length of 2, 12, minus 10 times the height, we're just looking at the area of that  rectangle again. Now we're just going back to the idea of how do you calculate  the area of a rectangle? You calculate its length and multiply it by its height. So  here I have a length of 10 to 12, so I have a length of two. I multiply it by its  height of 1/10 Oh, I get a number that's 0.2 In other words, there's a 20%  chance that if you were to call someone at this sales center that they would  have between 10 to 12 years of experience. Now, again, that would make  intuitive sense, right? If you can have anywhere from two to 12 years of  experience, that is an interval of 10. And if I wanted to look at the chances of  you having between 10 and 12 years of experience that's an interval of two. 

Well, two out of 10 is the same as 20% That's what we're doing here. That's  what we're doing. So hopefully that example helps see this idea of how to be  able to look at an interval when it comes to a continuous variable, now the nice  part about distributions is here we also have the idea of being able to calculate  expected values and variances. The expected value from a uniform distribution  is the sum of the two values on the outside of the range a plus b divided by two.  Oh, wait a minute, it's basically the average between the smallest value and the  largest value. Now, what is variance? The variance, or the spread of this  distribution is just going to be the largest value b minus the smallest value a.  We're going to square that number and divide it by 12. Don't worry about the  math on how we got that equation, that's for a more complicated class. This is  just an idea of this is what the spread is. Okay, so again, now we can go back to  our same example. Assume that sales calls that go into a company are  uniformly distributed by the years of experience of the sales staff, so that  everyone has the same chance of getting a call. So another question we could  ask, instead of what's the probability of you getting someone between 10 to 12  years of experience, is what is the expected years of experience of a person  answering a new sales call. Okay, well, again, that makes sense. That's a  reasonable question. What do we expect to have in terms of experience on  people answering sales calls, well, we can calculate that the expected value is  nothing but the smallest number plus the largest number divided by two, or it's  just the average between those two numbers, which for us is seven, that would  be right in the middle, right in between two and 12, which again, that makes  intuitive sense, right? If everyone has an equal chance of being selected, the  youngest person in terms of experience has two years of experience. The  person with the most years of experience has 12 years of experience. Then  essentially, what we have is, well, I expect someone with about seven years  experience right in the middle to be the person answering the sales call, not  every time, yes, but on average it's like rolling a dice, right? What is the average  roll of a dice? Well, I'm not going to get a one every time or a six every time, but  I'm going to get values equally there, but on average I'll roll something in the  middle. Same here, I can get a two just as much as I can get a 12, but on  average in the middle I would expect a seven. We can do the same thing with  variance. The variance, the spread of this would be the largest value, 12 minus  the smallest value, two. So, 12 minus two is 10. 10 squared is 100 divided by 12 would be 8.33 So, again, we have an idea of 8.33 in terms of the spread of this  distribution. Now we've talked about standard deviation in the past. If you'd like,  you can take the square root of 8.33 to get the standard deviation of this  distribution. I invite you to do that yourself. All right, we've talked about a lot in  this lecture. Let's summarize. So a random variable follows a uniform distribution whenever the probability is proportional to the interval's length. Remember the  example we went through, we had an interval length of 10, you had between two

and 12 years of experience, therefore the probability of each of those values  was 1/10 Now the probability density function for the uniform distribution is the  equation that you see here. Again, it helps relate that idea of pick any interval  between a and b, and the proportion, the probability density function for each  value is just one over that difference between a and b, right. So, again, in our  example, a was 2, b was 12, so there's a difference of 10. So our calculation  was one over 10. I know it's a lot of stuff, but all through these examples, we're  hopefully trying to solidify this idea of how to be able to think about randomness, how to be able to think about probability. Now, not for just a discrete distribution,  but for a continuous distribution. So, that is the end of this lecture. I look forward  to seeing you in the next one.



Modifié le: lundi 15 juin 2026, 09:42