Welcome. In this first section of the course, we're going to be talking about what  is data, and really, when we say what is data, it could also be what are data.  Data can be considered both a plural and a singular version of the word, so  sometimes you'll hear people say data as in more than one, or data as a  singular thing, either way, the idea around data is that it is factual information  used as a basis for reasoning or discussion or calculation, but let's break down  that definition around data just a little bit more. What do we mean by  information? Well, by information we mean the idea of measuring something.  Think about values that describe something, values that may describe an object  or a person or a place or a thing. Some examples, if our object was a person  and we were trying to describe a person. We could describe a person using their height, using their weight, their age, their race, their spending habits, and so that would be how we could describe a person if we were trying to describe an object or a thing like a car. We could describe a car based on its mileage, how good its  gas mileage is, its color, its size, again a variety of different characteristics or  measurements. Last but not least, we can think about an example of a website.  With a website, we can measure things like number of clicks or page views, or  how much revenue we get on a specific ad. Either way, through all of these  examples, hopefully you can see the goal here is to gather information, where  information is some kind of measurement used to describe something. But okay, so we have this information, but notice the second half of the definition used as  a basis for reasoning, discussion, or calculation. What do we mean by that? We  mean the idea of inference. The idea of inference is that we're using information  to come to some kind of grander conclusion. Think about it. We want to use the  information that we've collected to draw some important conclusions, but not  necessarily about only the pieces of information that we have. We want to be  able to make better decisions in the context of our entire problem. So, with that  being the case, we're going to use this information to be able to answer  important questions like who, what, where, when, why, how, who is buying my  product, what product do they prefer, where do the people who buy my product  live, when do people typically buy my product? Why is it that people are buying  my product? How are they buying my product, whether it be in person or online? So, as you can see, we can use the information, those measurements about  those different pieces of objects or people or places to be able to try and draw  better conclusions to be able to handle whatever context or problem that you're  trying to solve. An example data set that we're going to work with throughout the entirety of this course is a data set that you see here, and it's actually provided  on your course website. The data set consists of a bike rental organization in  Washington, DC. Now, this information is a little outdated. It was collected back  in 2011 and 2012 but it still serves a purpose for us. It measures a variety of  different things about not only how many people used the bike service that we  offer, but all the different things about the time of year, the days specifically 

involving temperature and humidity, and a variety of other factors. If you were to  take a look at that data set, what you would see is a notion of a table, and that is how we typically view or look at data. We have rows and we have columns. For  

the rows, we typically call these things observations. Well, what are  observations? Observations are typically those individuals or those objects that  we're collecting information about, so again going back to the idea of data, data  is a series of measurements. Okay, well, what are we collecting measurements  on? Individuals or objects or places or things? Those are what we typically store  as the rows in our data set, or our data table. We call these rows again  observations, but what about the columns? The columns we call variables. A  variable is just a characteristic that describes those observations. Think about it  as a piece of information, a piece of data. So, again, when it comes to what  we're looking at, we can describe those days and those users of our bike rental  service by a variety of different characteristics. These are what we typically have as columns in our data table. Now let's break down those variables a little bit  more. Let's focus in on those columns. There are two main types of variables, or two main types of columns, that we look at for a data table. The first is a  qualitative set of variables. The second is a quantitative set of variables. So,  what do I mean by qualitative? Qualitative is a variable or a piece of data with a  measurement scale inherently categorical. Let me show you an example again,  going back to that same data table I showed you earlier, things like date,  weekday, season, type of weather. These are categorical pieces of information.  They're not numeric in structure. Something like season, winter, fall, spring,  summer, weather, misty, clear, snowy. These are categorical variables. Now,  when it comes to these qualitative or categorical variables, we can break those  down further into two specific groups. The first is what we call a nominal  categorical variable. A nominal variable has categories with no logical ordering.  In other words, you can say which categories they are by putting them really in  any order. It doesn't really matter. Think, for example, color of car. It doesn't  matter if you list the color as green, yellow, blue, red, blue, red, yellow, green,  yellow, red, blue, green. It doesn't really matter what order you're listing things  in. There's no logical ordering to those categories. Therefore, that would be a  nominal piece of information, an ordinal qualitative variable, or an ordinal  categorical variable, on the other hand, has categories with a natural logical  ordering to them, things that have an order in how you would say them, for  example, low, medium, high, high, medium low. Those are the logical orderings  of those three categories. You wouldn't see someone list them as medium low  high, because that wouldn't make intuitive sense. Let me show you some further examples with that same data table, something like weekday, for example,  Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday. This is a  categorical variable, but it is an ordinal categorical or an ordinal qualitative  variable. Why? Because the seven categories of weekday have a natural logical

ordering. Again, Sunday, Monday, Tuesday, Wednesday, Thursday, Friday,  Saturday. This is the natural ordering of these variables. It's the same for  season: winter, spring, summer, fall. There's a natural logical ordering. You  wouldn't see someone listed as fall, spring, summer, winter, and so something  like season and weekday could be considered an ordinal qualitative variable,  ordinal because of the ordinal structure, the logical ordering of the categories,  qualitative, because the values, the pieces of information are categorical in  nature, and variable, because it describes individual pieces of observations.  Now, something like the weather type variable, that one may be a little bit harder to discern on whether it's nominal or ordinal. For example, in the weather types  that we have in our data set, we have clear or partly cloudy, we have misty, we  have rainy or snowy, so again you might be able to make an argument that  these are nominal. It doesn't really matter what order that you list these  categories in. However, some people might be able to make an argument  around them being ordinal categories, going from nicer weather to more  precipitation again, that one's a little bit up for debate, but something like  weekday or season is obviously an ordinal qualitative variable. All right, we  focused a lot on the ideas of qualitative variables, let's go to the other main type  of variable, a quantitative variable. A quantitative variable is a column of  information that summarizes data where it is numeric and defines some value or some quantity. So you can think of qualitative pieces of information as  categorical in nature, where quantitative pieces of information are more numeric  in nature. Again, let's go to our data table and see some examples. Some  examples in our data table here: temperature, humidity, number of casual users, number of registered users, all these things have a piece of quantitative  information. Again, taking a look at temperature, we can see on that first day,  January 1, 2011 the temperature outside, the high temperature that day was  46.7 degrees Fahrenheit. That is a numeric piece of information. It describes  some kind of value or quantity, and so that being the case, this is a quantitative  variable. Same idea for number of registered users. People who use our bike  service could be registered users or more casual users. Registered users sign  up to make sure that they have access to a bike. Casual users just sort of come  and go and try and see about things being available when they need it. They  don't register ahead of time, so we can see how many people who are  registered users on that day used our biking service, for example, 654  registered users. This is a quantity, it's measuring some quantity about the  observation we're interested in, so it's a quantitative variable, same for number  of casual users. We can also see on that day 331 casual users used our bike  renting service. Now one quick piece of information, not all variables that are  numeric are quantitative. Some examples of this would be something like date  or social security number or zip code, for example. Zip code may be measured  with numbers, but it's really a qualitative piece of information. It's a geographical 

location. I know it may be a little hard to discern. We have these things that are  numeric, but they're not quantitative. So, what's an easy way for us to be able to tell if something we see is numeric, but it's not actually a quantitative variable.  One easy way of looking at things is you can take a look at a variable and ask  this question, Can I do basic arithmetic on this variable and have it be  meaningful? What do I mean by that? For example, if you were to take the  average, and I know we haven't talked about averages yet, or if you were to add up the values inside of a variable, would it make sense? For example, if you  were to look at the average height of people in your family, well, that makes  sense. You have multiple people, each with their own height. We can take the  average of those heights, so height is measured in a notion of a quantitative  piece of information. We can't take the average, for example, of a zip code.  What's the average of two zip codes? It doesn't make intuitive sense. Same idea for a social security number or a driver's license number, or any kind of personal identification number. You can't really average two people's social security  numbers and have them make intuitive sense. So, let's go back to our data  table. When going back to our data table, we can see some. Examples again of  quantitative variables, and again these things would make sense to us. For  example, we can go again to the temperature variable. The first day in our data  set was 46.7 degrees Fahrenheit. The second day 48.4 degrees Fahrenheit.  Well, again, are these quantitative pieces of information? Can we take the  average temperature between those two days? Yes, yes, we can. I can take a  look at the average temperature between 46.7 degrees Fahrenheit and 48.4  degrees Fahrenheit, because I can do that, because I can perform basic  arithmetic. This would be a quantitative piece of information. Again, we can do  that same thing when it comes to humidity, the number of casual users, or even  the number of registered users. In fact, we can even add the number of casual  users on all Sundays to see what the total number of users are that we see on  Sundays for our bike rental service. Again, that would imply that casual users is  a quantitative piece of information. I can sum them all up and still have them  make sense. Notice on the far left hand side of the screen, though, you have  something like date. Date, although written as a number, is not a quantitative  piece of information. You can't take the average of January 1 and January 5, for  example, and have that make meaningful sense. So that is why, if you  remember earlier, we listed date as a qualitative piece of information. Again, you can think about it as a piece of categorical information. Now, date is an ordinal  qualitative variable, it has an order to it. There's a certain order in which you  would list dates in, but they are not inherently numeric and quantitative.  Hopefully, that helps distinguish the ideas between a qualitative variable and a  quantitative variable, so let's summarize this lecture first. Data is a factual piece  of information used as a basis for reasoning or discussion or calculation. The  whole idea is we're going to use that information to make inferences or grander 

conclusions about things that we're interested in, of course, as we're collecting  this information, this data, we will structure it in what we will call a data table, or  a data set. Typically, data tables are structured where your rows are specific  observations, those things you're collecting information about, and your columns are the pieces of information you're collecting for those rows. These columns,  again, we typically call variables, so again, think about it as the columns, the  variables are describing the rows or the observations inside of your data table.  Also, in this lecture, we talked about two different types of variables, the first one being qualitative, think about these as categorical pieces of information,  remember when it came to these categorical pieces of information, we had two  different types. Nominal categorical pieces of information could be listed in any  order, something like color of car, whereas ordinal categorical pieces of  information have an inherent order to them, such as day of week. The other type of variable we talked about was a quantitative variable, a notion of a piece of  information that's measured in some numerical way. So, that is the end of this  lecture, and I look forward to the next one with you.



Остання зміна: вівторок 19 травня 2026 08:56 AM