Sunday, February 16, 2014

Segment Tree. Finding the sum of a given range in O (logN). Implementation of segment tree

 We have the following task. We are given an array of integers. There are 2 types of operations we need to perform. 1st update x,y. Change the xth element to y. 2. query x-y  find the sum of elements in range [x;y].
Of course the first way which pops up in mind is the simple brute force solution. That can work only if the constraints are low but here we are going to discuss a solution which will work for big constraints (in other words every query will work in O(logN) time not O (N)).
    We will do that with the help of segment trees. Segment tree ( sometimes called interval tree) is a very special way of data representation. Segment tree is a tree which is fully binary.I think It would be better If we discuss it on an example.
Suppose we have this simple example
N=5
1 2 3 4 5
We have 5 elements. Ok . Now have a look at the segment tree of our data set and we will continue discussing.






















I am sorry for the bad picture, I am not good at this sort of stuff but anyway I hope you can understand it.
Here are a few things to notice for the beginning.
1. Every node in this tree shows the required information (in our case it is the sum) of a specific range. The numbers of nodes are written in green, the interval of every node is written in dark brown. For example the node 4 represents the sum of elements between 1 and 2.
2. The leaves are initial elements of the array.
3.  For having a fully binary tree we need to add up some elements which won't affect our answer to the closest power of 2. In our case we have 5 elements and we need to add 3 more elements so that we can have a fully binary tree. I marked the extra added elements with red X. So in our case we need to add 3 0s because when we sum 0 with other numbers the numbers are not changing.
4. If the current node which has the index of I shows the sum of elements in interval X and Y then
    his 2 children will have the index 2*I and 2*I+1 and , the 2*I the node will show the sum of elements between X and (X+Y)/2 and the 2*I+1 the node will show the sum of element between (X+Y)/2+1 and Y. So, basically we are cutting
  Implementation
We will need to make 3 functions.
1. Building the tree.
2. Updating the tree
3. Querying the tree

1. Building tree
let M be the closest element to N which is not smaller than N and it is a power of 2.
Our tree will have 2*M-1 elements. So our initial elements will be from M to 2*M-1 which will be our given elements from 1 to N and also the extra elements we will add to form a number which is a power of 2. For that we will keep a linear array name Tree. Tree I will be one node of a tree which will have 2 children with indexes 2*I and 2*I+1. So tree[I]=tree[2*I]+tree[2*I+1];
  for(i=m;i<=2*m-1;i++)
tree[i]=a[i-m+1]; (Note that the indexing starts with 1 in array A and in array Tree).
For the rest we need to loop back and every element Tree[i]=Tree[2*i]+Tree[2*i+1];
               for(i=m-1;i>=1;i--)
tree[i]=tree[2*i]+tree[2*i+1];
2. Update tree
   Updating is easy. We are getting an index of an array as an updating parameter. We need to find which node represents the given element of an array and update the tree by cutting the index of that node to half until we reach 2. (Note that the children of node I are 2*I and 2*I+1 , both of this number will give the result I when divided on 2).  This can be done recursively.
void update_tree(int x,int y)
{
tree[x]=tree[x]+y;
if(x==1)
return;
update_tree(x/2,y);
}
Also NOTE that when we receive a parameter I which tells us to update the value in A[I] our function must receive a index of a node not an index of array in our case we will give our function the index m+I-1.
3. Querying
  So, this one is a little bit complicated so I will try to be as clear as possible. Our function is going to have 5 parameters.
int query_tree(int node, int x, int y, int qx, int qy)
1. node-  node shows the current node we are observing
2. x,y - those 2 show the current interval which the node is representing
3. qx, qy,  - those 2 represent the interval which sis being queried.
SO, the node which has an interval of x to y has 2 children . The first child has an index of 2*node and the second child has an index of 2*node+1. The first child shows the answer for an interval X and (X+Y)/2 and the second child shows the answer for interval (X+Y)/2+1 and Y. our function is recursive. Now we need to check for 3 things.
If the queried interval and the current interval of current node are the same we need to return the answer on the current node.
If the previous condition is not true we need to check for 2 more things. If the queried interval belongs to the interval of node*2 (in other words qx and qy are between X and (X+Y)/2) then we need to call this function again but with new parameters ( query_tree( node*2 , X, (X+Y)/2, qx , qy)) .
If the queried interval belongs to the second child of the current node (in otehr words if the qx and qy are between (X+Y)/2+1 and Y then we need to call this function again similar to the previous call. ( query_tree( node*2+1, (X+Y)/2+1, Y, qx, qy) .  
If the 2 conditions below are not met that means that the queried interval has 2 parts where the first part belongs to the first child and the second part belongs to the second child. This calls for 2 recursive functions
query_tree(2*node,x,mid,qx,mid)+query_tree(node*2+1,mid+1,y,mid+1,qy); (mid is (X+Y)/2) , as we can see we divided our qx to qy into 2 different queries , teh first one is from qx to mid and the second one is from mid+1 to qy.
 For having a better understanding check this task from spoj.com and the source code for this task can be found in this site in the section "SPOJ tutorials". 

No comments:

Post a Comment