Glittering Code: Importance of Data structures

Data structures are essential to understand for any computer science/IT student or computer science/IT Engineer. But for a novice, the real importance of data structure still remains a mystery.

Data structure, as a discipline, is all about structuring data. There are different ways to keep data in memory. If you have 5 integers, you can save data in 5 different variables and you if you have to search some item in these, it is trivial job. But say if you have millions of integers, it is very difficult to save it in 1 million different integers and imagine how painful it is to search in 1 million variables. So we need some way to keep data in memory a structured manner.

There are multiple ways to store data. Each and every data structure has its own advantages and disadvantages.

Let us take an example say I have to create a phone number book. These days, phone book can have millions of entries since everyone has a mobile phone. For this purpose, We can simply place data in continuous/contiguous memory location. Such data structure is called array but it has limitation that we have to know the number of entries in advance. But it is possible that I I don't know number of entries in advance. It can vary from million to billion. So in such a case either, I take array of size 1 billion but it can waste huge amount of memory if there are lesser entries. So we needed some another data structure where we could save variable number of entries or in other words which is capable of self growing. This necessity invented linked list. Linked list and array both has their own pros and cons.

So far, all looks good. Now let us add something more to this problem.

If I have to find one phone number in this phone book to check whether such phone number exists or not. How would I do that?

I will start scanning in my linked list/Array(whichever you chosen depending upon whether you know the size in prior or not) and if I find element in middle we can simply stop our search or if number does not exist I will have to scan whole list/Array. Scanning through all the elements in such a huge list gonna take time. Imagine, you searched for a number in your phone book and it keeps on searching for 5-10 mins, How frustrating it is going to be. So we need some another data structure where search operation is faster. It resulted in binary search trees(BST).

in the BST, at each node/element, you have two paths. One has all the elements that are greater than itself and another path has all the elements that are less than or equal to it( As shown in figure)

BST

If you have to search 5 in this BST, we can do it in 3 steps. 9 ->7->5
If I have to find out 1. That also I can do it 3 steps. 9->7->5 at 5 we realized there is no path means 1 must not be here.

Now one can argue that I can achieve this if I keep my data sorted in linked list or array. But sorting itself is costly operation. To sort a list/array (or to keep it sorted while adding/deleting), it will require order of n^2 time which means that if there are million entries, for sorting it will take order of 10^12 which is even worse than normal scanning.

It required just 3 operations in BST to search an element as compared to 7 in array or list. Roughly, It takes log2(n+1) steps in BST to search an element. But for large value of n, this cost can still be unacceptable so we needed such data structure where cost of search is constant irrespective of number of elements. Hence Hash was discovered. Although it does not always guarantee that search will always take constant time but on an average search operation has time complexity of O(1)

Similarly motivation behind invention of graph was to store Maps etc.
For different needs, world kept on discovering new data structures e.g. Queue, Stack, skipped linked list, rope, heap etc.

Glittering Code

Sunday, 10 June 2018

Importance of Data structures

No comments:

Post a Comment

what is array

Search This Blog