Sunday, 10 June 2018

Importance of Data structures

Data structures are essential to understand for any computer science/IT student  or computer science/IT Engineer. But for a novice, the real importance of data structure  still remains a mystery.

Data structure, as a discipline, is all about structuring data. There are different ways to keep data in memory. If you have 5 integers, you can save data in 5 different variables and you if you have to search some item in these, it is trivial job. But say if you have millions of integers, it is very difficult to save it in 1 million different integers and imagine how painful it is to search in 1 million variables. So we need some way to keep data in memory a structured manner.

There are multiple ways to store data. Each and every data structure has its own advantages and disadvantages.

Let us take an example say I have to create a phone  number book. These days, phone book can have millions of entries since everyone has a mobile phone. For this purpose, We can simply place data in continuous/contiguous memory location. Such data structure is called array but it has limitation that we have to know the number of entries in advance. But it is possible that I I don't know number of entries in advance. It can vary from million to billion. So in such a case either, I take array of size 1 billion but it can waste huge amount of memory if there are lesser entries. So we needed some another data structure where we could save variable number of entries or in other words which is capable of self growing. This necessity invented linked list. Linked list and array both has their own pros and cons.

So far, all looks good. Now let us add something more to this problem.

If I have to find one phone number in this phone book to check whether such phone number exists or not. How would I do that?

I will start scanning in my linked list/Array(whichever you chosen depending upon whether you know the size in prior or not) and if I find element in middle we can simply stop our search or if number does not exist I will have to scan whole list/Array. Scanning through all the elements in such a huge list gonna take time. Imagine, you searched for a number in your phone book and it keeps on searching for 5-10 mins, How frustrating it is going to be. So we need some another data structure where search operation is faster. It resulted in binary search trees(BST).

in the BST, at each node/element, you have two paths. One has all the elements that are greater than itself and another path has all the elements that are less than or equal to it( As shown in figure)

BST

If you have to search 5 in this BST, we can do it in 3 steps. 9 ->7->5
If I have to find out 1. That also I can do it 3 steps. 9->7->5 at 5 we realized there is no path means 1 must not be here.


Now one can argue that I can achieve this if I keep my data sorted in linked list or array. But sorting itself is costly operation. To sort a list/array (or to keep it sorted while adding/deleting), it will require order of n^2 time which means that if there are million entries, for sorting it will take order of 10^12 which is even worse than normal scanning.

It required just 3 operations in BST to search an element as compared to 7 in array or list. Roughly, It takes log2(n+1) steps in BST to search an element. But for large value of n, this cost can still be unacceptable so we needed such data structure where cost of search is constant irrespective of number of elements. Hence Hash was discovered. Although it does not always guarantee that search will always take constant time but on an average search operation has time complexity of O(1)

Similarly motivation behind invention of graph was to store Maps etc.
For different needs, world kept on discovering new data structures e.g. Queue, Stack, skipped linked list, rope, heap etc.


Saturday, 9 June 2018

priority queue and its applications

Priority queue is a data structure like queue. Before talking about priority queue, Let us just briefly talk about queue first.

Queue is a First In First Out (FIFO) or Last In Last Out (LILO) Data structure which means that you will retrieve the elements/Items in the same order in which they are inserted. Queue is motivated from our real life. We see queue in our real life at many places, at railway station, In temples, in Office/Hostel Mess etc etc... Person who enters the queue first is served first.

Priority queue is little different from queue. Instead of first come first serve policy, it serves the items based on their priority. Item with highest priority will be served first irrespective of when it was entered in the queue. Each item/element that enters the queue has a priority assigned to it. E.g.

1,2,3,10,4,6 is entered in the queue in the order from left to right and say higher the number, higher is the priority.

When item is retrieved from the queue, it will return 10 although 1 was entered first in the queue.

Now question comes why it was even discovered? How is it useful?

Priority queue is also inspired from our daily lives. Say you have multiple tasks to do at a time(cooking, cleaning, checking email, ordering vegetables and fruits). You make a list at a notepad and decide to do them in order. While working, you get a high priority task like your spouse call you to pick your child from school as she/he is busy in meeting. It is another task that you have to do but you will serve it fist since it is highest priority.

Similarly in computer science, there are many applications that needs such handling. Best example that is coming to my mind is real time operating systems.
RTOS Scheduler maintains a queue of pending task but it can not serve them in First come first serve order . It picks up task that has highest priority.

E.g. OS used in Airplanes. While operating system is doing some of its job. A job, of informing the pilot that fuel tank is going to be empty in sometime, comes. This is the job that requires highest attention. Hence scheduler will schedule this job first among all jobs.
Now we will talk briefly about its implementation.

To return the maximum element, we have two options.
  • Save the items in the sorted itself i.e. As soon as item is entered in the queue, place it at its right place.
    • It will increase time of adding element to O(N) from O(1) in the queue but time to retrieve element from queue will be constant O(1).
  • Find the element with highest priority from queue when item has to be retrieved from queue.
    • It will not affect time of adding element. It is going to be same O(1) but it will increase time of retrieving element from O(1) to O(N).
One can use any implementation that suits her/his application's need. But it is quite clear that ultimately both are gonna take N^2 time whether it is invested while adding or removing. 

So to improve the run time of your queue, heap is used which can reduce complexity to o(NlogN). 

Sunday, 3 June 2018

stable sort

Stable sort is a category of sorting algorithms that do not change the relative order of two or more equivalent elements. But question is how does it matter?

e.g. if I have few  elements in a list : 5,4,6,4,9,8,10

Let us call first 4 as 4/1 and second one as 4/2.

5/1, 4/1, 6, 4/2, 9, 8, 10

And after sorting result could be one of the following :

4/1,4/2,5,6,8,9,10

or

4/2,4/1,5,6,8,9,10

Is there any difference in two results? No there is no difference, To the user both 4 are same and equal.

Difference would arise if I have to sort user defined datatype say students.


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
class student {
  int roll_num;
  char *name;
  char *address;

public :  

//! constructor
//! destructor
//! copy constructor

  student(const student &rhs) {
   roll_num = rhs.roll_num;
   name = new char[strlen(rhs.name)+1];
   strcpy(name,rhs.name);
   address = new char[strlen(rhs.address)+1];
   strcpy(address,rhs.address);

 }
};

int main() {
  student st[5] = { {1,"abcdef","address1"},
                    {2,"zyxw","adress2"},
                    {3,"wert","address2"},
                    {4,"abcdef","address3"},
                    {5,"qwerty","address4"}  
                  };

}


Here array st contains information of 5 students and if you notice it is automatically sorted by their roll number but now say you want to sort them with their name. What should be the order after sorting?

st[0], st[3], st[4], st[2], st[1] 

OR

st[3], st[0], st[4], st[2], st[1] 

Unlike case of integers, both are not equal. st[3] and st[0] are equivalent in their name only. So a sorting algorithm which is stable will place st[0] before st[3]. It will not change their relative order.

We have different sorting algorithms like merge sort, quick sort, insertion sort, bubble sort etc etc. Which one of them is stable and which is not? Generally people use merge and quick sort because of their complexity of order (nlogn).

Merge sort is stable while quick sort is not (reason/explanation behind this is out of scope of this post).
So this can be one factor in choosing sorting algorithm. If your application requires stable sorting, you can't simply rely on quicksort.


Could you think of an application that use stable sort?

 There are many. One of them that came to my mind is that when we try to sort different items in Microsoft windows Explorer based on different parameters.

FIGURE1 : Items sorted on name

FIGURE2 : Items sorted on date modified 

In the Figure 1, you can see that items are sorted on their name field.
In the Figure 2, items are sorted based on their last modified date.

In Figure 2, DRIVERS and vpn folders have same time stamp but DRIVERS is kept ahead of vpn so that their relative order is not changed


Did you know that Java internally uses quicksort for sorting built-in data types like int, double etc as we saw that for built-in types relative order does not matter but for user-defined data types, it uses merge sort so that relative order is not changed.

what is array

Array is one of the simplest and most  sophisticated data structures. In array data is stored at contiguous memory locations. E.g. say you h...