Video Transcript: Data and Databases
Hello and welcome back to Introduction to Information Systems. We have started off with the first three weeks talking about some of the just an overview of what makes up an information system, and then talking about some of those components that make up an IS, so that would be our hardware, and then we talked about software. Now we're ready for the third and final component of the technology umbrella, and that's going to be data and databases. So we're going to get into that in just a minute, but I would like to start us off in prayer, to get us started and our Dear Heavenly Father, please watch over every student today as they are opening their minds and their hearts to new information. And I ask that you bless them, allow them to take this information and use it for your good and to advance your kingdom, so that we can, especially with this topic of data, that that we're able to to glean that important information to move forward with our families and our organizations and our ministries, and allow you know all of us to to be Lights for you. So in your heavenly name, we pray. Amen. Going to go ahead and get started with our data and databases. And you might be saying to yourself that this sounds like pretty dry information. The word data is just, I think you might be a data person, if you pronounce it that way. We can still be friends. I, myself, am a data girl, but the word data itself, I think, is pretty intimidating. It just when you're getting into all the processes and that need to happen in order to use data successfully. Hopefully we're going to break that down today so that we can understand if you have some processes in place, and you kind of know the structure around all this information and these tiny little pieces of data, we're going to be able to harness those for for good. So here we are gonna pop my picture up here in this There we go. Hopefully, by the end of this module this week, we're going to be talking about some of those differences between what's the difference between data and information? Are those used interchangeably? I I am definitely at fault for using those terms interchangeably sometimes when I shouldn't, what's the difference between information and knowledge? Do you use those interchangeably? So our our goal here is that if you are ever in your job or in your family, in a position to be making really big decisions that you could go into a boardroom with your boss, with the CEO of an organization, and you could speak knowledgeably about some of these information systems, topics related to data, and be able to really have that vernacular down so that you can be part of those big decisions of moving your organization forward, so kind of knowing those nuanced differences between what makes data, information, knowledge is really going to help you move forward, hopefully in your career path. Our hope is that you can define the term database and talk about how we would create one. What's the difference between a spreadsheet and a database? For example, we're going to be talking about the role of database management systems and the characteristics of what makes a DBMS and what makes a data warehouse. And then finally, we're going to talk about data mining, which, and the big keyword of the last 10 years,
probably which, is big data. What does that mean? And how do you get it and how do you use it? We're going to be talking about all those things to hopefully arm you with good information. Okay, to recap, we know that there are five components of any information system, and those first three hardware, software, and, of course, data, are part of the technology umbrella. So we're wrapping up our technology umbrella today with data, and that's we're going to define that as any quantitative or qualitative, raw bits and pieces of information without any context. So that's our third piece. And then, of course, our next two modules are going to be about the people and the processes that are involved, also in information systems. So when we look at data, we'll go ahead and look at the qualitative versus the quantitative here, and that's going to be the quantitative data is any numeric, any numbers, measurements, counting math. Calculations, things that can be quantified or measured. Our qualitative data is going to be more descriptive, and as we are talking about what you know, what's most appropriate, they both have their place. There's not one type of data that's more useful than another. They each have, you know, their their place in their usefulness. So as we talk about the qualitative versus quantitative, just keep in mind that if it can be measured, typically, it's that quantitative piece. If it's more descriptive, it's the qualitative piece. And then, as we talk about our metadata a little later on in this video, these two, these two terms, are going to come back and be very important for us. All right, so what are we going to do with all of this data that we get? So if we look at the bottom here, we can see that when we do get a tiny piece of information without any context. So if I say to you the numbers 18, 23, five and 106 those are data points. They have no context. They are not useful for you because they are bits of bits of data with no context involved. Once you, once you give it some context, it becomes more powerful. Then we, we're going to go ahead and call that information. So if the data has those numbers, once I put into context that those are scores in the grade book, those become pieces of information which can then move forward and inform some decision making at the knowledge level, once you have enough information, you can recognize patterns, you can recognize trends, you can make assumptions, and that knowledge goes up to become wisdom. Once you have enough knowledge and enough experience of data feeding up through this pattern over time, you will definitely gain wisdom through that accumulation of data that is happening. So it does have to happen in that order. You can't jump to wisdom, sadly, and enough data without context is certainly not going to be useful. You really do need to you're going to see this pattern emerge in your workplaces, in your families, of whenever you are making decisions. This is how it happens, and this is how data becomes powerful and becomes something you can use. You don't need a formal education to have this process happen for you. You just need to be aware of the context in which you're getting these tiny little bits of information and what you do with them. So now that you have this data,
where are you going to put it? And we're going to talk about a database here. This is related data in context. So like the context that we're talking about, here's an example up here on the screen, and this is from a doggie daycare. And so the first chart we see here is a spreadsheet with the dog's name, what kind of food they eat, if it's wet or dry. Is this a good boy? Those should all say yes. I think they're probably all very good boys. However, on this, this is one piece of this is now information. Each one of these is a is a data point out of context. It wouldn't mean anything. The letter Y would not give us good information to make decisions. But once it's in a field of context of that Fido is indeed a very good boy. That's now information that can be used now to to pull so we have relational databases, and that's what we're going to be talking about first. There's also databases that aren't that are non relational. We're going to talk about those in just a second. But these ones are all related to each other. So I can see that first spreadsheet. Now if we follow that arrow down, we're going to have that in relation. So now we're adding another column for the tag number that each pet wears, the breed of dog that they are, the color that they are. We have their height, their weight, their age, all these other pieces of information, because it's not just data anymore. Now there's context. So once we have this information, we're able to go ahead and move that into a knowledge base or a database that we see here. And some examples of some relational databases are you might have Oracle in your workplace, where they have myriads of information of all these fields that are organized in such a way that they can then query or put in a search for data points in each one of these categories. And then we can combine them. We can really use them together, because chances are, if you want to know, you don't want to know every single piece of information from your chart at all times, but we're able to combine them in ways that make sense for projects that you might be working on. And here I'm going to move my face. There we go. And so we can, we can see that as we are manipulating these databases. This is when their usefulness and their power really starts coming to light, when we can take fields from one spreadsheet, move them over to another through this power of a database. All right, these fields that you see right here, where it says, name, good boy. Tag, number, height, these are all fields. And now again, this is talking only about a relational database, because this won't be the case in just a minute, but this is, sorry, try and move forward to the next There we go. So a relational database, every field is going to have a type, and so it's either going to be text that's going to be when you're storing non numerical data, it is brief. It is generally under 256 characters. That seems to be the magic number. Once it's above that, it turns into a paragraph field. But so this is going to be really important for things, so that the designer of the database can identify the maximum length of the text involved, to see if that's appropriate. But that's one type of field in a database. Then there's numbers. Pretty straightforward store numbers, usually a few different
number types that can be selected depending if you're doing decimals, if you're doing fractions, would be if this is something that would be beneficial for you, yes, no, like we saw earlier, for good boy or not, this is a toggle form. It's a special form of a number data that has that binary system that we talked about earlier. It's either on or off. It's a y or an N, yes or no, or we can put ones or twos in there as well. There's a date and time field and that, again, it's pretty straightforward, and there's a couple different forms that can take place, depending whether you are in the United States or Asia or Europe. The way that those are displayed can be varied, but that will go ahead and display the date and time for you. This is very common in a database. Currency is a special form of a number field that, again, is going to have values with a currency indicator and up to two decimal places that you can have there for dollars and cents, or whatever currency you're working on. There are, there's a paragraph text field, and this is for those texts that's longer than the 256 characters that would be a paragraph text. And then finally, this object field that's going to be the data type that allows for storage of data that can't be entered in by keyboard, like if you have an image file, if you have a music file, an mp three file that you wanted to put in, you would go ahead and insert an object into that. And that's where it does get kind of complicated. Those aren't typically rendered in a in a query or a search as easily as the other types of fields, but that is a possibility, if that is something that you find yourself needing as you're setting up a database. Okay, so how do we find information here from the databases when it's a relational database that we are talking about? So relational database, think columns and rows like a traditional spreadsheet. There is this term up here that you see SQL. It's also pronounced SQL. You might hear people talking about a SQL Server, the Structured Query Language. That's just another way of saying how to search in a spreadsheet. There's an example here for you of the type of language, how it looks when you're running an SQL or a SQL query. And so you would put this in one time. You can have, there's many templated queries that are already made for you. You can create custom ones. If you use a program like Microsoft Access, you can actually tell Microsoft Access what you want to find and that it will create the SQL or SQL language for you if you're not familiar or comfortable with doing that. So there are some some cheats and work around if you aren't familiar. I know it can be intimidating at first, as you're putting together some SQL structured queries. But the benefit of doing this is that you can really gain insights into these relational databases, and you can find patterns that really wouldn't have been visible before, unless you're querying these huge amounts of data. All right, so I know I just got done telling you that database is our relational database. Think columns and rows. So you might be wondering, well, what's the difference then, between a spreadsheet and a database, which would be a very good question. So when several types of data are going to be mixed together, or when the relationships between the types of data are really
complex, then a spreadsheet is not going to be the best solution for you. It's just not going to be robust enough to do the types of things we were just looking at, where you're combining those different types of fields together and in different sheets and having them communicate to each other and talk to each other in cells. So if you're working with a database that's going to allow from data from several entities. So let's say I'm working in a university and I want to be looking at student information. I want to know what clubs they're in, what their birth date is, what their memberships are, what events they've attended, all kinds of things that would be different fields, dates, numerical values, text, paragraph objects that then a database would certainly be more appropriate and help me reach my goals of really finding Good wisdom from all of these pieces of data. All right, so we talked a lot about relational databases. Now we're going to take a little walk over here. And doesn't this just look miserable, this non relational database. This is for things that rows and columns. It's just not appropriate. It's just not the best use. And you're going to find that enterprise databases so more than just yourself using it, a lot of people on the organizational team, they're going to find themselves using more of these non relational databases. And I know, just by looking at that image on your screen, I'm going to move my face again here, that it can be certainly intimidating. There are a couple different times kinds of non relational databases. Again, we're not looking columns and rows this time. This is purely data storage for query. There's document oriented databases, and those are known as a document store, and those are designed for storing, retrieving, managing, documented oriented information. So those usually pair a key with a complex data structure. So in other words, you create a document that is substantial in size, it's going to have some key indicators within that document to imagine you stored a whole thesis paper, or you had the entire text of a book that wouldn't fit in a cell of a spreadsheet, of course, but it would fit in what we see here. This non relational databases. That's a document oriented so we have our key value stores. So we can see here, this is a database that uses different keys, where each key is associated with only one value in a collection. It's like a dictionary that one of the simplest databases, among not only SQL databases, can use so that key, think of it like like a code. And then we have wide column source. This uses rows and tables and columns, but unlike relational databases, the names and format of these columns can vary from row to row, and that would be a lot like what you see here in the example. Here's the key, here's the document. They're not all the exact same. We can see that this has some HTML language which would not be appropriate for a traditional relational database. And then finally, we have our graph stores, and the graph database uses, of course, graph structures for semantic queries. There's nodes, edges, properties. This is going to be more of when we're doing geographic data, and we have a lot of charts, we have graphic values, but that would not be appropriate for a spreadsheet in Excel. There's no way to put an entire chart or
graph into a field in Excel that would give you meaningful data when queried. So this is a great way to work around that we would use a non relational database for that. Okay? So now that we know our our databases, now I'm going to throw one more term at you that some people use kind of interchangeably with database. This is a data warehouse, and a data warehouse uses non operational data. This means that the data warehouse is using a copy of data from the active databases from the day to day operations, so that it has to pull from existing databases on a regular basis. So it's a time variant. So we're going to extract data from one or more of the organization's databases and load them into the data warehouse for storage and analysis. And if we go back to our example from a previous a previous video on memory for hardware, if we think about the pantry in the kitchen being the hard drive or the SID, and then we think about the RAM being that readily available, kind of the prep kitchen, where it's for the recipe you're cooking today. This is kind of the same thing, where the data warehouse, think of that as the big pantry. So this data that gets extracted is on a time variant, so it's going to go back in the pantry, here in this data warehouse, for storage, for analysis, to be ready for the next time you're ready to cook in the prep kitchen and take it out of the pantry. I hope that makes sense. So the data in the warehouse is going to be standardized. It will come from many different sources, so it's going to be possible that it doesn't even use the same definitions or units. So this is kind of a catch all pantry, because you're getting information from a lot of different sources. That's not uniform, like it would be in a relational situation. All right, we're going to talk about metadata and and data mining in just a moment. First, we're going to talk about the benefits of data warehouses, and this is going to be organizations that find data warehouses quite beneficial for a number of reasons, one of which is the process of developing a warehouse forces an organization to better understand the data that it's working with. Once you have to find a place for it, you're forced to really examine what you've got. This is where a lot of organizations inadvertently find trends that they didn't even know they were looking for. A warehouse provides a centralized view of all the data that's being collected across the enterprise. So if your organization kind of naturally lends itself to working in silos, and I've worked for many organizations that kind of have this habit where I'm spinning my wheels on a project, and there might be somebody three desks down from me spinning their wheels on an extremely similar project, and if we would only realize and be able to talk to each other that our information and data could really benefit each other. And so this is a great advantage of using a data warehouse for that centralized view for the entire enterprise, to be able to have benefit of all of that data once it's identified as a consistent piece of data. Or a data warehouse, an organization can generate one version of the truth. I'm sure many of you have worked with the that would really benefit from this happening within your organizations. When the company
wants to have consistent reporting, if they want to rely on really good statistics about their revenue, their number of employees, their whatever the case may be, their progress towards achieving strategic goals as part of a plan, anything that would be a unified message from the organization. If you do have a data warehouse, it's a lot easier to have one truth that everyone can just have great visibility into, as opposed to those siloed, you know, just databases. This is the benefit of having a data warehouse as part of your enterprise. And then also, by having a data warehouse, snapshots of data can be taken over time. And then this gives you just, you know, for good posterity's sake, of having a good historical record of data, so it allows you to have better analysis of trends, of patterns, and it gives you the tools, if you have a warehouse, to combine this data. So when you get new information, it's way easier to analyze and to kind of add on to what you already have. So I know that was a lot to throw at you as far as the benefits of warehousing. But if your organization, or your enterprise is large enough, definitely the benefits are certainly there for what to do with all this data that you're collecting would be to warehouse. We're going to talk about a word that has been the. This word, certainly recently. There we go, and that is this word meta, if you've been in the Facebook, Instagram world, it's just a term that has been very, very recently overused. It is this, this idea of being meta and of knowing about what you're knowing. And so this is information about our data, and this is how we find good trends. It's good for decision making. And so let's just say that you are in your iTunes library or your if you are a Windows user in your media player folders, and you're looking at your mp three files, that's the actual data. Okay, the metadata is going to be information about that data. So let's say I've got an mp three file, the metadata would be the artist name, maybe the venue where it was recorded, anything else about the song title, if it's acoustic or live, or whatever that case is, that describes the data. So why it's important to have the metadata is just knowing about what the data is, helps us to know where to put it, how to organize it, how to best harness that information. Again, the more context we have is just power when it comes to data, all right, and then once we find trends, that's how we make good decisions over time, and then all of this coming together of the data through the metadata helps us with our data mining. And this is going to be a field that has come about. If you are mining data, it's going to be some things you're looking for. Other things you can take the presupposition away, and it's just things that appear as a trend that you can use. So for instance, if you are working for an organization that is retail and they're selling goods, and they can notice patterns about exactly what's selling exactly what's not over time, they can see from all those transactions. For instance, Walmart has over 1 million transactions per hour, so this data mining would be extremely useful for being able to harness all that information and look for trends on what's happening. There's no way that an employee of Walmart could be expected to sift through that much transaction data. And if you
remember our our triangle the bottom would be all those transaction data points. So being able to to look at that and make good decisions with it is directly due to data mining how to find those trends. So to wrap it up, when it comes to data, I know this has been a lot of kind of tedious information because it is so similar to each other. So all these terms, like data and information and knowledge and wisdom and databases and data warehouse and data mining, all these things are so similar, but we want to make sure that we can can differentiate those things and use them for the benefit of our organization or our families. So in summary, we know that data is made up of small facts and information that don't have context. So not very useful on its own data by itself is is not great, because once you give it context, then you have information. We learned about databases which is different than a data warehouse. And we talked about the the enterprise value of having the data warehouse in the big pantry, as opposed to the databases, which are more action oriented versus the spreadsheet, which is a more simplified, it's not a robust situation. We talked about how we can glean information from these, these documents, such with with SQL Server and the like, of being able to query that data. And then all of these come together by just making our businesses more intelligent. So this field that has emerged. Business Intelligence. That's actually what my graduate degree is in, is in business intelligence. Once we harness that business intelligence, it really does give us the competitive advantage that that other organizations might not have, because as we know, knowledge is power, and I hope that is the case for you in this course. And as we move forward, next module, we're going to be talking about networking and communication and how all these things can talk to each other. So let's say we've established a great database, a great data warehouse all these things. How is it communicating with the other people and machines in the organization? So I hope that this has been useful, and I am very excited to meet with you for the next module. We'll see you next week. Thank you.