So let's finish off this section on gathering data here with our last lecture. The  first thing we're going to talk about in this lecture is the idea of experiments. If  you've ever collected data before, some of you may have collected data through the use of an experiment, but what is an experiment? Well, to understand what  an experiment is. First, we have to understand what an observational study is.  Typically, data collection usually gets classified as either an observational study  or an experimental study. Let's look at an observational study, because that's  really all of the examples we've been using up until now. In an observational  study, the researcher or the person collecting the data does not interfere or  intervene in the process of collecting data. Basically, it requires selecting a  sample. However, in an experimental study, also known as an experiment, the  researcher or the person gathering the data, specifically manipulates the  conditions in which the study is carried out. This is by design. It requires  selecting a sample and conducting and designing an experiment around that  sample. Let's take a look at some examples. Imagine you wanted to know the  average height of the adult population in the United States, because you're  designing a new clothing line for adults. Well, this would be an example of an  observational study. We're just observing what has happened, height in our  population of interest. The byte data set would be another example of an  observational study. All we're doing is we're just looking back at time and seeing  in the past how many people have used our service and what the weather  happened to be at that time. An experiment is structured a little bit differently,  though, but before we can understand an experiment and look at an example,  let's talk a little bit about the terminology people use when referring to an  experiment. In an experiment, the researcher, or the person gathering the data,  randomly assigns what they call treatments to experimental units. Okay, what do we mean? Well, an experimental unit is just an observation. Think about the  person of interest or an object of interest. Well, let's start from the bottom and  work our way up. A treatment is a specific experimental condition. Okay, well, it's experimental condition. It's something that we can potentially control. Well, what  can we control in experiments? We can control certain variables. These  variables are known as factors. It's basically a variable used to predict, and it  takes on a finite number of values. Again, think about it as a categorical variable, or a qualitative variable. In experiments, we call them factors. The level of a  factor is basically the categories inside of our qualitative variable, so factor think  qualitative variable level think category in that qualitative variable, so again,  when we apply a treatment, this is a specific experimental condition, usually it's  the level of a factor, if there's only one factor, or a certain combinations of the  levels from several factors. Let me give you an example to help walk through  some of these things. Let's imagine a mechanical engineer wanted to determine  which variables influence gas mileage of a certain year and model of car. Well,  gas mileage is the variable that we're interested in. You can think of cars as the 

experimental units. The factors that we're going to study will be tire pressure,  which has two levels, low and standard, as well as octane rating of fuel, which  again here will have three levels: regular, mid grade, and premium. Again, these  factors are just qualitative variables, right? Tire pressure is just a qualitative  variable. Octane rating of fuel is just a qualitative variable, but in experiments we call them factors. Now we're also going to measure other things, but we're going to try and control for these things. For example, we're going to try and control for weather conditions, or route, or tire type. So this would be an example of what  we're looking at. A treatment would be a combination of things like low tire  pressure with regular octane ratings, and then standard tire pressure with  regular octane ratings, just to be able to compare low versus standard tire  pressure in one level of octane rating, and we would do that for all the levels of  octane rating as well, the key thing that makes this study an experimental study  is the active role the researcher plays in manipulating the environment. Again,  take a look back at this example. We're going to take very specific examples  and test those specific predetermined examples. This is not something we're  going back and looking at afterwards. That makes it really hard sometimes to be able to actually have a true experiment. In fact, a lot of times in certain  healthcare scenarios, in economic scenarios, we can't actually do real  experiments. Now, yes, you may have heard of drug studies before, and those  are experiments, but let's imagine you wanted to measure the effects of smoking on children, you think smoking is bad. You want to see the impact that it may  have on children. Well, it would be rather unethical to actually have an  experiment where we basically said this group of children is going to receive  secondhand smoke, this other group of children is not. Let's see which one's  bothered more, that would be an unethical thing to do. Same idea for the second example. Let's imagine that we're trying to measure the effects of family unit  income as a child for college performance. In other words, we're going to  basically put certain families in poverty, we're going to give other families a lot of money, let the children grow up and see how they perform in college. Again,  that's not an ethical thing to do. These would be examples of observational  studies. We can look back and see children who have been around secondhand smoke before and see if we can measure some effects of what's going on, but  we're not intentionally putting children in the future inside of a situation that may  harm them. Again, the big difference between an experimental study and an  observational one is, in an observational one, we look back on data that has  already occurred. In experimental studies, we designed them ahead of time to  collect data in the future. There are three key components to a well-designed  experiment. So, let's talk about these components. The first deals with  randomization, where we're basically taking treatments and randomly assigning  them to experimental units, we've dealt with randomness before, but again now  we're talking about it in terms of an experiment. We want to make sure that 

we're not specifically designing and assigning treatments to very specific people, we want to again randomly assign them to make sure we're getting an unbiased  approach. The second component that is key is what we call replication.  Replication is when we have multiple subjects assigned the same treatment. If  you only tested, for example, a new drug out on one person, and that worked  really well for that one person. How do you know you just didn't get lucky? The  idea of an experiment is to make sure we can repeat it, so we have replication.  Subjects who have the same treatment are called replicates. The more  replication you have, the more confidence you can have in your study  conclusions. That is why you see big, well-designed experiments typically  worked on many individuals. We also have what we call control. A control is  where some study conditions are held constant. This helps us reduce variability,  controlling certain variables. Sometimes we call these nuisance variables that  can impact what we're interested in will allow us to make a better inference  about what's actually going on. It basically makes sure we can see things easier. We can actually see differences because of our treatments, not because of  other things. Again, we can go back to that car example. The whole idea of  holding constant the weather conditions, the route, or the tire type is because  those are potential nuisance factors that get in the way of gas mileage, so these  would be things that we are trying to control, so when we're taking different cars  and we're measuring the gas mileage of different cars under different conditions  like tire pressure or octane ratings. We're going to make sure those different  cars drive the same route. We're going to make sure they have the same tire  type, and we're going to make sure we do it under the same weather conditions. That way, those things won't impact our study on what tire pressure and octane  rating can do to gas mileage, so let's summarize an observational study, which  I'll be honest with you, is a lot of what data analysis is these days, is a place  where researchers or data gatherers do not interfere or intervene in the process  of collecting data, we just observe data, and we try and understand associations and relationships after the fact. An experimental study, however, is where our  researcher manipulates the conditions specifically in which the study is carried  out. What we are trying to do with experiments is we are trying to be able to  isolate the effects of treatments, so we can more confidently say that something  is happening because of a treatment. Now, there are three key components to a  well-designed experiment: first is randomization, second is replication, and last  we have control, excellent. So we've talked about the idea of experiments, but  that does bring up another subject that we should talk about when it comes to  gathering data, and that is the idea of data ethics. The gathering of data leads to questions around the ethical collection and use of that data. We talked about an  example previously. We want to understand the smoking effects on children. It  would not be an ethical thing to do to be able to subject children to these kinds  of situations just so we can understand some experimental question, as 

Christians, though we're held to an even higher standard around ethical  considerations, so we always must keep these things in mind as we're living up  to a higher standard. I personally believe that God not only makes us stewards  of money, but he also makes us stewards of a variety of things in our lives, our  talents, our jobs. Here are the data that we collect. We need to do so in an  ethical manner. So, in observational studies or experiments, we must keep the  interest of the subject we're collecting data from at the forefront. Again, in that  example, when it came to the children and the effects of smoking, if we keep the interest of the children at the forefront, then what we can do is we can say, you  know, this is not a good idea to be able to do, in fact, in 1964 the Helsinki  Declaration of the World Medical Association made this statement: the interests  of the subject must always prevail over the interests of society and science. So,  when we start collecting our data, there are safeguards that we can have, for  example, institutional review boards, informed consent, and confidentiality. Let's  talk about each one of these. What is an institutional review board? Well, people have to exist that have the best interest of the subjects of the data collection in  mind. Sometimes the people running the experiments may get blinded by  wanting results. So, an institutional review board would come in, and they would be the people that would have the best interest of the subjects of the data  collection in mind to be able to keep the experimenters or the people gathering  data in an observational study focused on what's best for the subjects, for  example, medical studies actually require institutional review boards to help  evaluate every single study before they are conducted, so again the subjects  are not put into any harm. There are many horror stories of situations before all  of this went into place when it came to institutional review boards where medical studies were done on people that should not honestly have been done, and so  again, these things are crucial to have others step into place to be able to keep  the best interest of the subjects in mind. Unfortunately, these are not required for a lot of business studies, however, the people collecting the data, potentially you should take the subject into account before any data collection is performed.  You can be the voice of reason at whatever company you're doing this for. So,  let's talk about informed consent. What do we mean by informed? Well,  informed a subject should be told what data is needed from them and what  potential outcomes come from the data being given to the people collecting it.  It's not just that we should tell people, hey, I need to collect this data about you. I need to also tell them what are the potential outcomes of them giving me that  data. So, we must ensure that all information is shared. Now, this may be hard  for those gathering the data, since they believe in their work and its usefulness.  However, you have to think about all the potential risks of having that data and  making sure those risks are revealed to the subject, so that's the idea of being  informed. Then comes the idea of consent. After being informed, subjects must  agree to the collection of data. Usually, this is done in writing, of course. If this is 

the case, we have to also consider the idea of who can actually give consent.  For example, can a small child give consent? Usually not. Usually, we have a  parent or a guardian who has to give consent for the child. Same idea applies to mentally ill subjects as well. So, again, these things are done a lot of times in  medical studies, but in business studies they may not be done. So, I invite you  to be the person in charge of this, if you're in charge of collecting data, if you're  in charge of analyzing data, it is your stead to be able to make sure that we are  good stewards and good examples for what people should and be doing with  this data. Now, I'll be honest, some people are afraid that consent is harder to  come by if you reveal all the possible bad outcomes, no matter how unlikely they are, but is that a bad thing? Is it bad to say, you know, there's a very, very slim  chance that this data could be used for something that you don't want it to be  used for. We need to make sure again that consent is given only after being fully informed, last but not least, we have confidentiality. Once data is collected,  privacy is extremely important. Confidentiality is where we have the subjects in  the data having their identified information masked, so you can report overall  statistics about data that is gathered, but not who it belonged to, unless you're  reporting results to others who own the data, so you can sit there and say, well,  the average age of this collection, this sample of people, is this, without  revealing everyone's individual age, of course, in the modern age of technology,  we see many stories of confidential data being leaked due to computer hacking.  Again, that's the downside of data collection, is there's risk, and so we must take all the steps that we can to make sure that the data that has been given to us,  we are stewards of that data. This data has been entrusted to us. We must keep it confidential and private. Now, being anonymous is a little bit different than  confidentiality. Anonymity is when identifying information about the subjects is  actually never known in the data collection. Anonymity is even more private than confidentiality. In confidentiality, we know who the information is collected on. In  anonymity, we don't. An example like that would be not actually writing down the names of the people that you're collecting heights or ages for, and so now you  just have a collection of people, but you can't tie it back really to anybody. So,  let's walk through an example of a website. You want to know which website  design will work better to get people to click on your products you randomly  show one of the two websites to people who visit your website to measure which design performs better. Are there any concerns around institutional review,  informed consent? How about confidentiality? Did anyone think or double check  this website design study that you wanted? Did you have anyone else review it?  If you're showing different websites to different people, have you actually  informed the people who are in this study that they're actually in a study? Did  they give consent? Are you tracking what these people are doing on your  website? If so, are you taking confidentiality into play? These are important  questions. What about a wearable medical device? You wear a watch that tracks

your heart rate and sends that information off to a company. That company uses the information to determine trends and characteristics of people at risk for heart disease. Again, did you agree to that? Maybe you did. Maybe you did  unknowingly. Was there any institutional review of this study? Again, I say, did  you agree to it? Maybe it was in all that small print at the bottom when you  bought that wearable device, and maybe buying it was you giving consent  again. We have to be very careful of this, and again, what about confidentiality?  Are these companies making sure that we can't be linked to our health data that  can potentially be stolen, so in summary, the gathering of data leads to  questions around ethical collection and use of that data. As Christians, we're  held to an even higher standard around these ethical considerations, because  we are stewards of this data in observational studies and experiments, we must  keep the interest of the subject we're collecting data from at the forefront. That's  why we have institutional review boards, informed consent, and confidentiality.  All right, so let's wrap everything up. Collecting data intuition. So, what have we  done? The main concepts in this section, we talked about samples and  populations, we talked about randomness, good and bad sampling methods, as  well as ethical concerns around data. So, again, intuition, population of interest.  Who are you really interested in gathering data around? Well, the biggest  problem with setting a population is not providing enough detail. So, make sure  you provide plenty of detail around what you're actually interested in. It'll actually save you time later on. Of course, we can't talk to that entire population, so we  have to take a representative sample, and that brings up a good question. Does  your sample represent your population? Good sampling methods that involve  randomness help you get a sample that represents the population. However, it's  still good practice to explore your data to make sure it looks like the population  in a common sense way. For example, it's possible to randomly get really lucky  and select only NBA players for your height study. However, upon quick  investigation, you realize your sample probably isn't right, so you take another  sample. Of course, if we're going to talk about sampling, we should talk about  good sampling. Does your sampling favor certain outcomes over others? This is  called bias. It's always good to think about your sampling method to make sure  you haven't built in any biases. Again, randomness helps with this, so make sure your sampling method has randomness to protect you against bias. And if we're  going to talk about good sampling. We should also talk about ethical  considerations. Can anyone be harmed or burdened by the collection and use of your data? It's an important question. Think about the possible harm the  collection of your data could have. You must be open and honest with people  you're collecting data on. Remember, God holds us to a higher standard than  the world. Let's represent him well. So overall, it's extremely hard to protect  yourself and consider all these things by yourself. Ask for help. I always like to  ask others who I know, especially if they have different perspectives and 

experiences than I do, to make sure I'm not missing anything when I start these  studies, so intuition and careful thought can protect you a lot of times when it  comes to data gathering, and don't be afraid, don't be afraid to use other people, especially those who are different than you, to help make sure you're  considering all the things you need to. Well, that wraps up this section on  gathering data. I look forward to seeing you in our next section.



Last modified: Tuesday, May 26, 2026, 9:00 AM