Tuesday, September 16, 2014

Edit Distance ( Levenshtein distance)

  The task is to find the minimum number of operations necessary to turn one string into another. The available operations are 1. insert a char everywhere in a string 2. Delete any char of the string 3. Replace any char in a string.  This is another Dynamic Programming problem. Suppose the length of the first string (S1) is N and the length of the second one (S2) is M. We need an array of NxM where the element A[I[J] shows the answer for the segments S1[1..I] and S2[1..J] (let us add a blank char in front of both string). For a better understanding let's have a look at a table below.



















As you can see the string starts with an empty character, which means that the first row and the first columns are filled by formula a[1,I]=I and a[J][1]=J which is true because for obtaining an empty string we need to delete all the elements of the other string which will give an answer of the size of the non empty string. Basically when looking at the cell A[I][J] we need to look to the cells A[I-1][J] , A[I][J-1] and A[I-1][J-1]. Let's consider the cell A[I-1][J] of we were able to build the string S2[1..J] from S1[1...I-1] for A[I-1][J] steps then fro building the string S[1..J] from S[1..I] we need to add one more step (just add the new character (new means the Ith) and it will be A[I-1][J]+1 moves). The same goes for A[I][J-1]. Now let's consider the A[I-1][J-1] the, int this case we have 2 new elements S1[I] and S2[J], we can build it with the same number of steps if we won't have to deal with the new elements which means that if S1[I[==s2[J] then we only need A[i-1][j-1] operations to make the string S2[1..J[ from S1[1..I] otherwise A[I-1][J-1]+1 operations. So the final formula is A[i][j]=min( A[I-1][J], A[I][J-1], A[I-1][J-1]  + 1 (+1 if S1[I]not equal to S2[J]). 

No comments:

Post a Comment