Databases
1. Introduction
You may not realize it, but we live in a world where databases are everywhere. The effect of databases on our daily lives are huge. From healthcare to searching something up on Google to social media to ATM machines, databases are responsible for many of the services we use daily. You also might have created a database without knowing it: if you’ve ever worked with excel, you’ve created a simple database. Microsoft Access is also widely used by people at home or in the office to create databases.
Of course, it wasn’t always like this. Only by the mid-1960s, as computers finally developed speed and flexibility, and started becoming popular, many kinds of general use database systems became available. Since then, the amount of data in our world has increased on a large scale, in various fields. Almost all websites, programs and apps use databases.
2. What is a database?
A database is a structured set of data held on a computer system. Databases play an important role in archiving data and keeping data up to date. This data can belong to the government, financial institutions or to science, but can also belong to something private on a smaller scale.
A database needs to be structured to be able to order and retrieve stored information. Therefore, the most used kind of database, the relational database, is built out of one table or multiple tables and looks something like this:
Shortly after computers were invented, databases were introduced. At that time, computers were basically giant calculators and data (like names, phone numbers) was considered to be the leftovers of processing information. But when more people (especially in business) started using computers, the leftover data became important.
But why exactly were databases needed for companies? Because when a company is still small, with only a few clients, data (like phone numbers, names, addresses) can be kept in a notebook or even just in someone’s head. But what if a company has thousands of clients, and needs a good, structured way to save the data while still being able to use it and search in it? That’s when databases come in handy.
Databases have both advantages and disadvantages.
Pros:
- You’ll have the ability to store large amounts of information.
- It’s easy to quickly share and access data in a database.
- You’ll have an overall increased efficiency: a database eliminates duplicates, reduces storage space, reduces data inconsistency, improves data security and more.
Cons:
- Databases are difficult and time-consuming to design.
- There is initial training required for all programmers and users of the database.
- There are a lot of conversion costs when moving from a file-based system to a database system.
- Databases require lots of space to store all information/data. On a normal computer the hard disc can be used. Nowadays, it’s also possible to use cloud storage: in other words, the internet. For large databases, companies can hire storage space like very large hard discs as well as extra services to maintain hardware and data.
A datacentre is a physical facility (a big building) where computing and networking equipment is concentrated for the purpose of collecting, storing, processing, distributing or allowing access to large amounts of data. They provide important services such as data storage, backup and recovery, data management and networking.
3. Big data and data mining
Can you imagine how much databases exist? How much data every database has saved? To give you an idea, Amazon's two largest databases combine for more than 42 terabytes of data, or 42.000 gigabytes. How managing such a big database works, is called big data. The term big data is used when people work with databases - structured or unstructured - that are too big and complex to be managed and maintained by a regular database management system.
The characteristics of big data are described with three V’s:
- Volume: the quantity of generated and stored data, usually in terabytes or petabytes.
- Variety: the type and nature of the data and whether the data is structured.
- Velocity: the rate at which data is generated and how fast the generated data is processed.
These three V’s are also used:
- Veracity: the data quality and data value.
- Variability: refers to data whose value or other characteristics are changing because of the context they are being generated.
- Value: how much useful things (utility) can be extracted from the data.
A term that is often confused with big data, is data mining. Data mining is a process used by companies to turn raw data into useful information. By using software to look for patterns in large batches of data, companies can learn more about their customers. This process consists of five steps:
- First, a company collects data and loads it in their data warehouse. A data warehouse is a system that pulls together data from many different sources within a company.
- Then, the company stores and manages the data, either on in-house servers or in the cloud.
- Business analysts, management teams and information technology professionals access the data, and decide in which way they want to organize it.
- Next, application software sorts the data based on the user's results.
- Lastly, the end-user presents the data in an easy to share format, such as a table.
An example of a way to use data mining is done by grocery stores. Many supermarkets offer loyalty cards that give access to cheaper prices. The cards make it easy for stores to track what a customer is buying, when they are buying it and for which price. After analysing this data, stores can use the data to offer customers coupons specialised on their buying habits and decide when to put items on sale or when to sell them at full price.
4. Technical aspects of databases.
3.1 Types of database models
As mentioned before, databased need to be structured. This can be done in different ways. The way a database is structured is determined while or modelling the design of the database. The way in which information is structured in real life determines which modelling works best to represent in the database.
There are a lot of different types of database models, so it is impossible to name them all, but these are five popular database models:
- SQL/RDBMS/relational databases. Relational databases are structured: you’ll have charts that may have dependencies or relationships on each other. For example, a database for a store will have a chart for customers and a chart for orders and they are related because an order is made by a customer. Popular relational Database Systems are MySQL, Oracle, Microsoft SQL server.
- NoSQL/non-relational databases. Non-relational databases are less known and used than relational databases. Non-relational databases are document-oriented: the document type storage has multiple 'categories' of data stored in one document. So, using the same example as with relational databases, a customer document will have the customer's information, another sub-category for all their orders, etc. Popular non-relational Database Systems are Oracle NoSQL, Redis and MongoDB.
- Flat file databases. In flat file databases, data is saved in one single table, unlike in relational databases. Flat file databases are usually in plain-text form, where each line holds only one record. The fields in the record are separated with delimiters (one or more characters that separate text strings) such as commas and tabs. Examples of software that can create flat file databases are Microsoft Excel, Microsoft Access and FileMaker.
- Hierarchical databases. Hierarchical database models use a tree structure that links several different elements to one "owner," or "parent," primary record. An item can only have a relationship with its parent or child record(s). The IBM Information Management System (IMS) and the RDM Mobile are examples of a hierarchical database system.
- Network databases. A network database model is similar to a hierarchical database, but multiple member records or files can be linked to multiple owner files and vice versa: it doesn’t only allow parent-child relationships, but also many-to-many relationships. Some well-known database systems that use the network model include: Integrated Data Store (IDS), IDMS (Integrated Database Management System) and Raima Database Manager.
3.2 Database languages (SQL)
A database language, also known as a DBMS language, is a language to read, update, manipulate, store data. The most used kind of database language is SQL: Structured Query Language.
SQL uses query’s for communication with the database management systems. MySQL is an example of a database management system. So, a query is an instruction that gets sent to the database management system. Examples of basic query’s/commands are:
- Show: displays information contained in the database.
- Select: used to select data from a database.
A SQL database system contains one or more objects called tables. The data or information for the database are stored in these tables. Tables consist of columns and rows. Columns and rows are sometimes confused with each other, but they are not the same. Rows run horizontally and columns are drawn vertically. Columns contain the column name, data type, and any other attributes for the column. Rows contain the records or data for the columns. Records contain all the data about one particular person, company, or item and refer to rows. Fields are data categories and refer to columns.
To explain this further, we will be using an example of a telephone book. A telephone book has a last name, first name, address and phone number for each entry; those categories form the columns of the database. Each entry in the phone book has a unique, identifying quality, which is the person's name. Each named entry in the phone book forms a row, which contains unique data of a person in each of those four columns.
A primary key in SQL is a field in a table which uniquely identifies each row/record sin a database table. This is very important when linking two (or more) databases together into one bigger database. Because of the key, the information about one person in two databases can be recognised as the same person. They can’t just use the names of the person, because there might be two different people that have the name ‘Nick Jones’, so to avoid merging data from two different people, unique keys are used.
5. What’s the future of databases?
The development of databases is going rapidly, especially if you compare databases from now with databases from 10 years ago. I think that the software to create databases and databases itself are just going to get more and more advanced as time goes by, making everything much more efficient. I also think that it will be easier to create a database yourself and that more people will use databases in the future, also for their own, small businesses. New trends in database technology right now are, for example, databases that bridge SQL databases with non-SQL databases, giving users the best capabilities offered by both. For example, it can allow users to access a NoSQL database in the same way as a relational database, making it more user-friendly and easier. Just imagine what we could do with databases in a few years!
However, the increased popularity of databases does come with disadvantages:
- Privacy concerns. The development of databases has given the opportunity to access and use data in ways that were previously not possible. Although this does have benefits, some of the methods of data collection raise concerns about privacy amongst customers. Customers are afraid that they are not in control about what information about them is collected and how the information is used or that their data is sold to third parties.
- Security concerns. Database owners need to take measures to prevent unauthorised access to the data, such as hacking into the database, physical theft of database servers or interception during physical/electronic transfer. Databases are the target of criminals because they contain large amounts of very sensitive data such as credit card numbers and passwords, another reason why customers have privacy concerns.
- Integrity concerns. The data in a database might be incorrect, irrelevant or not up to date because it was entered incorrectly, something went wrong while transferring data from another database, it was changed accidentally etc. This leads to a company spreading wrong information, which could have serious consequences.
6. Sources.
https://www.lexico.com/en/definition/database
https://www.oracle.com/be-nl/database/what-is-database.html
https://informaticalessen.be/gegevensbeheer/wat-is-een-database/
http://www.cl500.net/pros_cons.html
https://www.quora.com/What-are-the-advantages-of-databases-What-are-the-disadvantages
https://dev.to/trevoirwilliams/relational-sql-vs-non-relational-nosql-databases-hi5
https://www.smartsheet.com/relational-database-modeling
https://www.techopedia.com/definition/7231/flat-file-database-database
https://www.computerhope.com/jargon/d/delimite.htm
https://study.com/academy/lesson/flat-file-database-definition-example.html
https://en.wikipedia.org/wiki/Hierarchical_database_model
https://www.techopedia.com/definition/20971/network-database
https://www.techopedia.com/definition/19782/hierarchical-database
https://en.wikipedia.org/wiki/Network_model#Database_systems
https://www.liquidweb.com/blog/ten-ways-databases-run-your-life/
https://www.dataversity.net/brief-history-database-management/
https://en.wikipedia.org/wiki/Big_data
https://panoply.io/data-warehouse-guide/the-difference-between-a-database-and-a-data-warehouse/
https://www.investopedia.com/terms/d/datamining.asp
https://www.tutorialspoint.com/sql/sql-primary-key.htm
https://computer.howstuffworks.com/data-centers1.htm
http://www.sqlcourse.com/table.html
https://www.techwalla.com/articles/differences-between-fields-records-in-a-database
https://keydifferences.com/difference-between-rows-and-columns.html
https://sites.google.com/site/mrstevensonstechclassroom/home/strand-3-it-systems/3-7-databases/database-issues-integrity-privacy-security
https://www.datavail.com/blog/4-top-trends-db-management/
Written in February 2020
Create Your Own Website With JouwWeb