Welcome to the next lecture in our section on what is data. In this lecture, we're  going to be talking about exploring relationships with data. Going back to the  previous lecture, remember where our goal is to drive inference to come up with  some grander conclusions. Essentially, what we're saying is that data by itself is  just that it's just information, information all alone. Essentially, for us to take  advantage of data, we need to draw insights from the data to be able to help us  make better decisions. Rarely do we collect data just to be able to collect it. We  collect it for a grander purpose. Now, maybe we don't know what that purpose is initially, but we want to be able to use it to make better decisions. Well, how can  we make better decisions? How can we draw insights from our data? That's  really what this whole course is about, is to try and help you learn to be able to  draw insights. Insights come from exploring your data. Later on in the course,  we're going to have an entire section on exploring data, but we're going to at  least preview that here again. We're going to go back to the same data table,  the same bike rental data that we had in our last lecture, and that we're going to  use throughout the entirety of this course. Again, we can see here on the left hand side we have some rows. These rows are summarizing different days, and  really we're summarizing what those days looked like, as well as how many  users we had on each one of those days. The columns that you see here,  remember, are variables. These variables describe these days, everything from  the categorical or qualitative variables on the left hand side, like weekday,  season, or weather type, and the quantitative variables on the right hand side,  like temperature, humidity, and number of casual or registered users, so let's  imagine that we had some piece of information. Let's imagine we knew that the  historical average bike rentals is 4000 per day. A new employee sees low bike  rental numbers over the first few days of the new year. Does that necessarily  mean trouble again? If we go back to our data table, if you were to look just at  these five days of the new year, and if I told you that the historical average total  users in a day is around 4000 oof, if we look at the last two columns, casual  users and registered users were a little bit over 1000 maybe in between 1000  and 2000 total users. Well, this new employee sees that may be a problem, but  again, let's explore our data, for example, this is the distribution of all total users  by day on the x axis, basically on the horizontal piece of information down at the very, very bottom you can see different ranges of number of total users,  everything from below 500 total users in a day to all the way up to 8500 to 9000  users in a day. On the vertical axis, on the left-hand side, the zero, 10, 20, 30.  What that is doing is telling you how many days had the corresponding number  of users? So we can see by looking at this, we have a lot of days where we  have more than 4000 total users in a day. Maybe it's just those first few days  that didn't show everything, so what we've done is we've looked at the  distribution of daily bike rentals. Essentially, we looked at all the different values  for bike rentals that we've seen historically, and then what we tried to do is we 

tried to look at what we would refer to as an average, basically the center of that collection, that distribution. The nice part is average and distribution are going to be things that we're going to be focusing on in this course. We're going to learn  about distributions, we're going to learn about averages, so you can make these 

same kinds of insights. This is just a preview. Well, maybe bike rentals drop in  the winter, that would make sense. We were looking at days in the early part of  winter, in January. If you remember that data table, here were some rather cold  days. I don't know about you, I don't prefer to ride a bike outside when it's really  that cold, so maybe that's what's driving the lower numbers that that new  employee is seeing again. This would be a piece of information that we've  collected. Now, let's try and draw insight from it. What you can see here is the  average total users broken up now by season. All I want you to do is focus on  the tallest bar and the shortest bar. The tallest bar you see is the summer bar,  while the shortest bar is the winter bar. In other words, we typically have more  users in the summer and typically have less users in the winter. Again, here  we're trying to draw insight. We can look at what we refer to as a bar chart of the data to see some kind of possible association between the number of people  who use our bike rental service, and the season itself. Again, the nice part is a  bar chart is one of the things we're going to be learning about in this course.  This is again just a preview. This whole lecture is really a preview of a way of  being able to draw insights from data. All right, so this is very intriguing. So we  have this idea where we have total number of users that we've looked at. We  looked at the total number of users just across all days, and it looked like there  was a nice wide spread. Some days we had less than 500 total users, some  days we had over 8500 total users, so we have a lot of different users that we  could actually have on any given day. Then what we did is we looked at those  users across different seasons, and we saw what appeared to be some kind of  relationship between the season of year and the total number of users, where in  the summer months we have a lot of people using our bike rental service.  However, in the winter months that's when we typically see a dip. Okay. Well,  now again, let's continue to ask questions. Asking questions around data is a  great way of discovering more insights. The first question was, does it look like  that our data is following this historical average of 4000 people per day on  average using our service? Then the next question was, I wonder if season  makes a difference. Okay, well, let's continue that question, and look at, is the  drop in the winter months that we saw in our last chart the same for registered  users and for casual users. Well, let's just ponder this to start. Remember, a  registered user is someone who registers ahead of time to be able to make sure that a bike is available for them. They pay essentially a service fee for this. You  can think about these people really as workers, probably using the bike rental  service to be able to go to their job, maybe do their day-to-day chores, such as  going to the grocery store, going to the gym, you can think about casual users. 

On the other hand, as being people who are sort of just using the bike rental  service as they need it. Maybe I want to take a stroll through a park, and so I'm  going to be able to rent a bike and go biking through a park, but we're probably  not going to be using it on a very consistent basis, if we're a casual user, so now again, the question would make sense, is the drop in winter the same for  registered and casual users, figuring people that are registered users, they  probably need to use the bike rental service, whether it's cold outside or not,  whereas casual users, on the other hand, if they're just using it when they want  to, they may not choose to use it in the winter months. So, let's take a look at  another chart. This is the exact same chart that I showed you previously. So, the taller the bar again, the higher the number of total users on average, but now  what I've done is I've broken those same four bars that you saw earlier into two  different pieces. The darker shade on the upper part of the bar is the average of  the registered users, whereas the lighter. Shade on the lower part of the bar is  the average of the casual users, so we can compare the breakdown of  registered versus casual users in each season. What I'd like you to do is focus  on the left hand two bars, the left hand two bars, the spring and the summer, it  looks like we have a bigger piece of casual users to the overall average, which  would make sense in the nice spring and summer months, when it's warm  outside and people want to get out and move around a little bit, we can have  more people who are casually using our bike rental service, however, let's focus  in on the right two bars, the fall and the winter. We can see that the light blue  casual users in those right two bars seem to be a smaller piece than what we  saw in the spring and the summer. Again, this would make intuitive sense as the weather starts to turn cold, as it starts to become a little bit less of an  advantageous thing to go bike around for fun. Less and less casual users seem  to be using our service. Isn't it amazing the kinds of insights we can look at  when just exploring our data. What I showed you previously is what we refer to  as a stacked bar chart, to really break down the original bar chart into different  groups, so we can see how the groups break down across different bars. Again,  a stacked bar chart is something we're going to learn in a later lecture. This is  still just a preview to show you how easy it is to be able to explore data and  draw insights from it. Wow, all right. So this has been very intriguing. We've  seen a lot of different things revealed by our data so far, well, why do you think  customers use bike rentals less in the winter? I mean, it probably has to deal  with the idea of the weather, right. And so, again, remember we are using  information from a bike rental service in Washington, DC, and if you've never  been to Washington DC before, it has a tendency of being colder in the winter  and again warmer in the summer. This isn't something like Hawaii, which is  warm all year round. So, maybe people use bike rentals less in the winter  because it has lower temperatures. So, again, let's explore our data. What I'm  showing you here is what's referred to as a scatter plot on the bottom axis, the 

horizontal axis, the zero, 10, 20, 30, 40, 50 that you're seeing at the very bottom, that represents temperature, the low temperatures are on the left hand side. The high temperatures are on the right hand side. The vertical axis, the axis on the  left hand side of this chart, the up and down axis. This is the total daily users  again on average, so we can see everything from zero users all the way up to  almost 9000 users. So, let's again take a look, so we can see a little bit of a  trend here, right? As temperature seems to go up, it looks like there's more total  daily users, whereas temperature goes down, there's fewer users. Look at the  coldest day, find the dot that's furthest to the bottom left. Notice how that's  almost 20 degrees outside for a high, and really we only had about 1000 users  that day. However, if we go into something like the 70s or the 80s for  temperature, we see a lot more total daily users, somewhere around three to  8000 users in a day. Again, it looks like as temperature goes up, our total daily  users tend to increase, like I said, that is what we refer to as a scatterplot,  specifically a scatterplot between temperature and user count to try and see  some kind of possible relationship where that relationship might be able to give  us some insights. Well, what about registered users are casual users. Again,  let's break down those similarly to what we broke down earlier. Let's take a look  at those separately. What you see here is registered users, same plot. Just now, instead of looking at total users, we're looking. Registered ones, so we can see  the same idea as temperature has a tendency of going up, so does the total  daily number of registered users. Again, on the colder days, in the 20 and 30  degrees, we typically have registered users between 500 to maybe 2500  people. However, on the warmer days, 80 degrees, we typically see users  between 2000 and 7000. So, again, it seems as temperature goes up, the  number of total daily registered users also has a tendency of going up. Let's see if this pattern still holds for casual users. Oh, okay. Now we're seeing something  a little bit different again. We see that temperature has a relationship, it appears, with the total daily casual users. As temperature goes up, so do the total daily  casual users. However, look at those low temperatures. Where we have really  low temperatures, we really do not see a lot of casual users. In fact, we see  really below 40 degrees, not a lot of users, anything from zero to 500 total daily  casual users, whereas if we're up in the 70-80 degrees, we can see anywhere  from 500 up to 3500 casual users in a single day. So, although we look back  and see that registered users, yes, temperature has a relationship, it seems to  have a more impactful relationship when it comes to casual users. Again, these  are some insights we can draw by exploring our data. So, let's wrap this up in  summary. Remember, data by itself, it's really just information, not overly helpful. However, by exploring our data, looking at our data, that can reveal potential  insights invaluable uses to that information. Now, I showed you a lot of different  plots and a lot of different things that we're going to be exploring in further detail  later on in the course, things like distributions, bar charts, stacked bar charts, 

scatterplots. We also talked about averages. All these things will be explored in  much further detail later on in the course, but hopefully this gives you a preview,  and hopefully, in all honesty, gets you a little excited about some of the things  that we can do to explore our data. I'll be honest, visuals help explore data so  well. By taking a look at your data, we can draw these insights. Let's remember  some of the insights that we had by just exploring and looking at our data in this  lecture. It seemed like that we have a wide variety of possibilities for number of  total users on any given day, anything from below 500 all the way up to almost  9000 total users. However, season has an important factor in how many total  users we see on a given day. We saw that winter had a tendency of being lower  than things like spring or summer. When we delve into this further, we see that  temperature plays an important role. As temperature seems to increase, we can  see that the number of total users tend to increase. However, when we broke  this down, when looking at the difference between registered and casual users,  temperature has a much bigger impact on casual users, especially on the lower  temperatures. Isn't it amazing what we can do with data? So, that is the end of  this lecture, and I look forward to seeing you in our next one. 



Última modificación: martes, 19 de mayo de 2026, 08:57