Thursday, October 31, 2013

LCS(longest common subsequence)

As we all know dynamic programming works like this.
1. Divide the big problem into smaller subproblems.
2. Solve the current subproblem with the use of previously found solutions.

We have 2 arrays of char char s1[N],s2[N]; and we want to find the longest common subsequence of those strings.
For finding the longest common subsequence we will keep an 2 dimensional array (matrix) int a[N][N]
and a[i][j] will show us the length of the longest common subsequence of  sequences
s1[1],s1[2],....,s1[i]
                         and
s2[1],s2[2],....,s2[j]

This is the pseudocode of our algorithm

for i =from 1 to n    ////n is the length of the first sequence
for j =from 1 to m   ////m is the length of the second sequence
if(s1[i]==s2[j])
       a[i][j]=a[i-1][j-1]+1
else
       a[i][j]max(a[i-1][j],a[i][j-1]);
Example with explanation


S
R
J
M
T
0
0
0
0
S
1
1
1
1
R
1
2
2
2
M
1
2
2
3

Suppose we have 2 strings s1="SRJM" and s2="TSRM"
a[1][1] shows the longest common subsequence for 2 strings "T" and "S", the string letter "T" is not equal to the letter "S" so we mark a[1][1] as 0. The same goes for the letters "R" "J" and "M"

lets get down to a[2][1]  . a[2][1] shows us the longest common subsequence of strings "TS" and "S"
as we can see we marked it as 1. The pseudocode says if s[i]==s[j] then a[i][j]=a[i-1][j-1]+1; =>
=>a[2][1]=a[1][0]+1 (All the elements are marked as 0 from the beginning)
Now we are at the cell a[2][2] which means that we need to look at 2 strings "TS" and "SR" . The letter "S" is not equal to letter "R" which means that we have to pick up the value of the longest common subsequence of either "T" and "SR" or "TS" and "S" . The second option has bigger value that is why we pick the second option which is a[i][j-1];

The full source code is also available for free download

DOWNLOAD THE SOURCE CODE 

IF YOU HAVE ANY QUESTIONS LEAVE THEM IN COMMENTS , NO QUESTION WILL STAY UNANSWERED



3 comments:

  1. Can you please explain logically why a[i][j] = a[i-1][j-1]+1? I mean, why is it that if we run this particula piece of code it gives the right answer?
    Also, your source code link seems to have expired. A 404 error is displayed. Kindly look into that. :)

    ReplyDelete
    Replies
    1. HI, If you have 2 strings for which you calculated the answer, if we add the same letter to the both of them from the back , for sure that new letter will exist in the LCS that is why if they are equal then the answer is 1 more than a[i-1][j-1] (s1[i] and s2[j] are the last letters, ans s1[1...i-1] and s2[1...j-1] are the strings without the last letter.
      The link is still valid, maybe you are using some ad-blocking software which prevent you from opening that link.

      Delete
  2. Hi ,though this not a proper question to ask, I am unable to download the source code for any of your programs. The link for any of the codes isn't working on my PC. Thanks for your great work!!!!

    ReplyDelete