Let's continue our discussion around exploring data. In this lecture, we're going  to be talking about the idea of center and really how can we describe a typical  value. So, let's jump in. When exploring data, a good summary of a variable  might be something along the lines of a typical value that that variable might  take, for example, if a qualitative variable, such as a categorical variable or  weather, came up. What would you recommend as the most typical value? If  someone asked you what the typical day looked like in terms of weather, most  likely you would pick the most common category, that is what you would  probably mean by typical for a quantitative variable. On the other hand, we like  to focus in on the center value of the variable, where center can be defined a  couple of different ways. This is what we're going to be talking about here in this  lecture. Let's jump into that qualitative variable, typical first. The first thing we're  going to talk about is referred to as the mode. The mode of a variable is the  most common value. This is the thing that we would typically report with a  categorical or a qualitative variable, more so than a quantitative variable, I don't  really care the exact value of temperature, for example, that occurred most  often, but when it comes to a qualitative or categorical variable, something like  you see here on the right-hand side in our donut chart, you can see the idea of  what weather could look like in three possible categories, we have clear or  cloudy, misty, rainy or snow. So, again, if someone were to ask you in your data  set, what is the typical weather day? Well, if you were looking at this qualitative  variable, you would look at these three categories, and say clear or cloudy  happens more often than not. It is the one that happens the most often. Again, it  is the mode. So, if someone were to ask the typical weather day, you'd probably  tell them it was clear or cloudy. Again, this would make intuitive sense when it  comes to qualitative variables, it wouldn't make sense if someone asked you  what a typical weather day would look like, and you would pick rainy or snowy,  because that doesn't really happen all too often in your data set. So, again, the  mode is what we would commonly think of with a categorical or qualitative  variable, but what about a quantitative variable? If mode doesn't work very well  for quantitative, what does the first thing we're going to look at is what we refer  to as the mean. Another name for the mean would be the average. So, what do  we mean when we say the average or the mean. Well, the average or mean of a variable is the sum of all of the values in that variable divided by the total  number of values that you have. So, for example, if we had a variable x, again  you can think of x as temperature, of height, of weight, whatever you like, but we have some variable x. What we would do is we would sum up all of the values of x inside of our data set, so x1 for example, would be the first value of x if x was  temperature, it would be the first temperature in your data set. x2 would be the  second temperature in your data set. x3 would be the third temperature, and so  on and so forth all the way up to x n, where n represents the number of  observations you have in your data or your sample, so again we were to sum up

all the values, x1 plus x2 plus x3 plus, and so on and so forth all the way out to x n. After we sum up all these values, we're going to divide by the sample size n,  and that gives us the average or the mean. A shorter way of writing this is what  you see here on the screen. That big symbol in front of the x is referred to as the summation. Notice how we have a couple of things written beside the  summation symbol, we have i equals one on the bottom and n on the top,  followed by x with a little i beside it. Well, what does that mean? Well, the  summation symbol means that we're going to sum up, we're going to add  together a bunch of different values. Well, what values are we going to add?  Well, we're going to add the x's. You see the x there on the right. Okay, but what  values of x are you going to add? Well, I'm going to add the values of x from one all the way up to n, that is what the i equals one on the bottom and the n on the  top are referring to. I'm going to sum up all the values where i takes the value of  1, 2, 3, all the way up to n. Well, let's find where that i is. Well, that I is a little  value right beside x. So, again, if you were to replace x i with x1 you would get  the first value here that you saw in the previous equation. Then we're going to  add the next value of i, x2 that gives us the next value in this equation, then  we're going to add x3 and x4 all the way out to x n, so again, what you see here, x1 plus x2 plus x3 all the way out to x n is the same thing we have written here,  it's just a little bit shorter to write it this way, so we sum up all the values of x, we  divide by the sample size, and that is how we calculate the average. Again, in  our data set, we would just sum up all the values of the temperature that we  have in our data set. First, we have 731 days in our data set, hence the 731 you  see at the bottom. Then what we do is we sum up all the values on the top. 46.7  degrees was the temperature on the first day. 44 I'm sorry, 48.4 degrees is the  temperature on the second day. So we do 46.7 plus 48.4 plus 34.2 and so on  and so forth until we add all 731 days in our data set we divide by 731 and we  find out that the average temperature inside of our data set is 59.51 degrees  Fahrenheit. Specifically, so in our data, if someone asked you the typical  weather day in terms of temperature and according to the average or mean, it  would be 59.51 degrees Fahrenheit again. Hopefully that makes intuitive sense.  We want to try and summarize what temperature is typical, so if someone were  to ask you typical, a lot of times people report things like the average. If I were to ask you what you think the typical height of Americans is, you'd probably guess  what the average height of Americans was, or if I asked you to look at the typical income of you and your friends, you would probably give me the average. It's  the same idea here. We're trying to look at the idea of typical through the  average, however, like I mentioned last time, the average is not the only way of  being able to measure typical when it comes to a quantitative variable. Another  way of looking at typical is what we refer to as the median. The median is  literally the value in the middle when your data items are arranged from smallest to largest, we call this ascending order. So, again, the median is just literally the 

value in the middle. Okay. Well, let's see a couple examples of the median. Let's imagine we have an odd number of observations over here on the right hand  side, you have seven observations. This is an odd number: 1, 3, 5, 7, So we  have an odd number of observations. Okay, well, literally we want the  observation in the middle, but these observations aren't in order. First, if you pick the observation in the middle here, it wouldn't be the right value. We have to first line them up from smallest to largest, so that's what we've done here. The  smallest value is 13, the largest value is 30. Now I want you to try and find it.  What value is in the middle of these seven observations? There it is, literally  right there in the middle. Three observations on the left, three observations on  the right. 26 would be the median of these numbers. Okay. Well, that makes  sense. Sense for odd numbers, where you're always going to have the same  number of observations on the left and the same number of observations on the  right, but what do we do if we had an even number of observations? Two  observations, four observations, six, eight. Well, we have eight observations  here again. That top row contains eight different values, but first things first, we  need to arrange them in order, so that's what we're doing on the bottom, we're  arranging them again in the order from smallest to largest, so 13 is still the  smallest, 30 is still the largest, so what are we going to look at for the median?  Well, there is no one number in the middle. There are three numbers to the left  of 26 but there are three numbers to the right of 27 So we have those two  numbers in the middle when we're calculating the median, and we have two  numbers in the middle. What we do is we take the average, the mean of those  two numbers. So, again, remember what we just talked about. When it came to  the average, or the mean, we sum these two numbers up: 26 plus 27 and divide  by two, the total number of numbers that we're trying to take the average of. So,  the median of this data set would be the average of the middle two numbers,  which would be 26.5 Remember, don't forget when you're calculating the  median, you have to make sure that things are in order from smallest to largest,  so if you were to look at our typical weather day again, according to the median  here, it would be 59.76 degrees Fahrenheit. So, if you were to look at the data  set that's on the website, you could rearrange all 731 days of temperature from  the smallest temperature all the way to the largest temperature, if you were to  do that and find the number right in the middle, it would be 59.76 degrees. Now,  why do some people report averages or means and others may report medians? Well, let's take a look at the comparison to see what would happen if something  extreme happened in our data. Whenever a data set has an extreme value, a  value that doesn't look like the rest of the values, sometimes we call these  outliers. The median is definitely preferred in terms of a measure of center. You  see, the mean is bothered and affected by extreme values. The median, well, is  not, and so, because of that, the median a lot of times might be a better idea of  what typical means for the center of something as compared to the average. Let

me show you an example that you see here. We have again the same seven  numbers that we were looking at earlier. We saw earlier that these seven  numbers have a median of 26 Well, if you were to take the average of these  seven numbers, which again we would add them all together, 13 plus 22 plus 24 plus 26 plus 27 plus 29 plus 30, and divide that by seven, you would get the  average of 24.42 Okay, well, these numbers are relatively close to each other,  so it doesn't look like they're reporting too different a piece of information. But  let's change one of the values, shall we? Instead of 30, is that last number on  the bottom row. What if we had 300? So, again, we have seven numbers. If we  were to look at the median of those seven numbers, 26 is still the number in the  middle. No matter how we change that, we have the same order 13, 22, 24, 26,  27, 29 but instead of 30 it's now 300 So the median does not change because  the order did not change, and if the order did not change, then the median won't  change. However, take a look at the average. Wow, that average value, the  mean is 63 Now, again, extreme observations bother the average. Imagine if  you were to take a look at the average height of people you know, but then you  met an NBA player that was over seven feet tall. Well, that average height just  shot up tremendously, not because really more people you know got taller, it's  just because you had one extremely tall person you knew. The median wouldn't  have changed, but the average would have. So, again, when you have data that doesn't have a lot of extreme values, the average won't be too bad as a  measure. Of center, however, when you do the median is a much better  measure, measure of center. Hopefully that helps you see this a little bit. So let's summarize this lecture. When exploring data, a good summary of a variable  might be what we would call a typical value or a center value of that variable.  For categorical variables, the mode is the most common value of typical. It's just the variable, it's the value of the variable, actually, that just happens most often.  It's the most reported category. However, when it comes to quantitative  variables, we have two different ways of looking at typical or center, that would  be the mean, also known as the average, as well as the median. The mean is  just the sum of all the values divided by however many values you just summed  up, and the median is literally the value in the middle when your data is  arranged from smallest to largest. And remember, means are bothered by  extreme observations. Well, medians really aren't perfect, so that gives us an  idea of how to be able to look at the center of different variables. Next time we'll  look at spread of variables, but for right now that is the end of this lecture, and I  look forward to seeing you next time.



Modifié le: mercredi 27 mai 2026, 09:54