Wednesday, July 9, 2014

Trie. Representing dictionary with the help of a tree. Implementation on C++

Suppose we have a long sequence of different words (dictionary)  and we have a word which we need "to look for" in our dictionary. There are different ways to do that, each of them will take different amount of time. Suppose this is our dictionary

apple
android
angle
binary
bit
brave
ceremony
current
condemn
drive
drag
drop

and we have a word which we want to find "bin".
What do we usually do?
well, the first letter is b so we will look up for the words which start only with the letter 'b' and ignore the rest. Then we will do the same for the second letter, we will consider only the words which start with 'b' and have the second letter 'i' and so on until we find what we were looking for,. this is how we do in life,
now about coding,
the easiest approach would be to just loop over all the words and compare them to our word which takes a lot of time, or we can simplify a little but, if the word starts with the letter 'b' then compare, or even simpler if the second letter is 'i' , or even better, if the third letter is 'n' , but how many if conditions can we write by hand?
This is where tries come to help us. Trie is basically a tree where we can collect word by moving from one node to another.























Above you can see an example of a trie. Basically we collect words with starting from the top node and moving to other nodes so the top node is an empty node representing "nothing" and then all his neighbors below him. Let's numerate the levels, considering the level with and empty node as level 0. So on level 1 we can see all the letters with which we have words starting in other words our dictionary has only words which start only with the letters 'a' 'b' and 'c'. Now let's have a look at the second level, just like the first level we can say that the second letter of all the words is one of the letters 'n','a','e','i', 'u', 'a', but there is more to that,  let's have a look at first level letter 'b', he has neighbors  'a' 'e' 'i' 'u' which means that if the first letter is letter 'b' than the second letter is going to be one of his neighbors , and so on. This is how we check for the word, we go for the letter which currently interests us , first letter at the beginning, then we check for his beighbors, if one of them equals to the second letter of our words then we move on to the node with that letter and do the same for next letters until we reach the end of our word and confirm that it exists or deny it if we won't reach the end.

IMPLEMENTING 

    We will require to use much memory for implementing this one effectively. There are only 26 letters so let us keep a long list  we also need a variable which will represent the next free place , lets call it nfp. I will explain the importance of it later. So the first line which has 26 characters represents our first level. The index of every element represents the letter so the first index is letter 'a' and so on. So the element on the first row of the first column tells us where to go after coming to that cell, basically the new row which we will be into will show us all the letters which are followed after 'a' and if they have a different value that would mean that we are going to continue  Initially our tree is empty. When adding a word we should start from the first letter. We check for the first row and see, if under the spot of the first letter there is a number which means that we have installed another word which is starting with the same letter as our current word, so we move to the row which that "first letter cell" tells us and continue looking there, now with the second letter. If we came to a spot where there is an empty cell and we have to fill in our letter we use the variable nfp (next free place) to fill it in and point to that new free place from our current letter.


Tasks to solve

   There is a basic task in spoj on tries, called "Phone List". And you can also find the source code here. 

No comments:

Post a Comment