Introduction to Database | Part 0

Database is an organized and structured collection of data.

Let’s forget about computers for a moment and take pen and paper. Say you wanted to store the name of your friends, their phone number and their birthdays so that you could wish them on their birthday.

What you could do is, write the name of your friend at first, followed by their phone number and their birthdays something like this and repeat this pattern:

bijay 0987124 19/06/1990 sudeep 0928191 04/05/1999 ajay 0987127 11/03/1997 ...

This data has an inherit structure to it. For any data at position n, such that n % 3 is equal to 1 then we can say that that data represents a name of your friend. Similarly:

if item at n % 3 == 1 : name
if item at n % 3 == 2 : number
if item at n % 3 == 0 : birthday

But say, one of your friend does not have a phone number

bijay 19/06/1990 sudeep 0928191 04/05/1999 ajay 0987127 11/03/1997 ...

Now, this data is not structured and no longer follows the rules defined above. This is an example of an unstructured data. You cannot call this paper of yours a database. [Think what you can do to solve this edge case]

If you wish you can write each of your friends information in new line for readability purpose:

bijay 0987124 19/06/1990
sudeep 0928191 04/05/1999
ajay 0987127 11/03/1997

Creating database in a computer

Now that we understand what a database is let’s come to computers and see how we can create a database in a computer. There are several options to create a database in your computer.

To replicate the above database, one simple thing we can do is create a text file and write the same information in the same way that we have done before. Now you can call your text file a database.

You can also use tools like Microsoft Excel to create your database, or store your data in a structured manner. But throughout this series of blogs we are going to understand one of the widely used database called Relational Database. Relational Database are based on relational theory or set theory.

In the above example we used the modulo operator(simple arithmatic) and the position of the data to understand what our data represents, in relational database we are going to use the relational theory to organize and understand what our data represents.

Database vs File System - Taken from Stackoverflow

Note: People often use the term database and database management systems interchangeably.

Database ultimately stores the data in files, whereas file system also stores the data in files. In this case what is the difference between database and file system. Is it in the way it is retrieved or anything else?

Historically, databases were created when the API provided by the filesystem OS were not good enough for the problem at hand. Just think about it: if you had special requirements, you couldn’t just call Microsoft or Apple to redesign their filesystem API. You would either go ahead and write your own storage software or you would look around for existing alternatives. So the need created a market for 3rd party data storage software which ended up being called databases. That’s about it.

The filesystem API is part of the OS, and have to implement certain API inorder to follow certain rules, whereas databases are built by 3rd parties and have complete freedom.

Fun Fact

Btw you can also use your file system’s inherit structure to represent data in structured fashion too:

  1. Create a folder called database.
  2. Inside this folder create a folder and rename it to match the name of your friend.
  3. Inside this folder create a folder to match the name of your friends phone number.
  4. Inside this folder create a file and name it to match your friends birthday.

Do the same for all the records you want to save. Something like this:

database/bijay/0987124/19061990.txt
database/sudeep/0928191/04051999.txt
database/ajay/0987127/11031997.txt

Now you can use this folder structure as a database. The first folder inside database folder is always going to give you the name of your friend. If you click the folder, it is going to give you their phone number and finally if you click on the phone number’s folder it is going to give you their date of birth. Also, if the last file is a text file, that means there is no more data left about the person and you could go to the root directory(database) if you want to get the next entry. You can then write a program to write and read this data.

Implementation of a simple database management system

Here we write a python program to create and manage our database discussed above, to store the name, birthday and phone number of your friends. Here we are assuming each data is seperated into different line.

How can we make it work for those which are not seperated by a newline? Hint: Convert the entire line into a list. Increment the index by 3. You can use n%3 trick to find what the current data represents.


class FileDB:
    def __init__(self):
        pass

    def serialize(self,data:dict) -> str:
        _str = ''
        _str += data['name'] + ' '
        _str += data['phone'] + ' '
        _str += data['dob']
        return _str

    def add(self,data:dict):
        try:
            f = open("file.txt",'a', encoding = 'utf-8')
            _data:str = self.serialize(data)
            f.write(_data)
            f.write('\n')
        finally:
            f.close()

    def deserialize(self, data:list) -> dict:
        try:
            _data:dict = {}
            _data['name'] = data[0]
            _data['phone'] = data[1]
            _data['dob'] = data[2]
        except:
            pass
        
        return _data
    """
        This function takes an unique name and returns their details
        specified in the option's list
    """

    def get(self,name:str,option:list)->dict:
        result = {}
        try:
            f = open("file.txt",'r', encoding = 'utf-8')
            for line in f:
                line = line.strip()
                data = line.split(' ')

                if len(data) != 3:
                    raise Exception("Sorry, the data is not structured") 
                if name in data:
                    _data:dict = self.deserialize(data)
                    for key in _data.keys():
                        if key in option:
                            result[key] = _data[key]
        finally:
            f.close()
        
        if result == {}:
            return None

        return result

# The name of keys should always be name,phone,dob
data_1 = {
    'name': 'bijay',
    'phone': '984300000',
    'dob': '14/07/1997'
}
data_2 = {
    'name': 'ajay',
    'phone': '9843111111',
    'dob': '23/08/1996'
}
data_3 = {
    'name': 'hari',
    'phone': '9841999999',
    'dob': '18/09/1996'
}

fileDB = FileDB()
fileDB.add(data_1)
fileDB.add(data_2)
fileDB.add(data_3)

response:dict= fileDB.get('hari',['phone','dob'])

if response == None:
    print("No record found")
else:
    print(response)

Output

Remarks

There is a lot of improvements we can make on this sample database management program of ours. But since this is not the focus of this blog/project, i’ll not be implementing more features to it. If you want, here are some ideas:

  1. Make it generic.(read column name from the file, allow different seperators like comma tab)
  2. Add delete and update features
  3. Improve our API
· linux