COMP3010: Algorithm Theory and Design

Daniel Sutantyo, Department of Computing, Macquarie University

7.0 - Longest Common Subsequence

Prelude

7.0 - Longest Common Subsequence

  • In the first half of the semester, we concentrated on three topics:
    • brute force \(\rightarrow\) dynamic programming \(\rightarrow\) greedy algorithm
    • plus complexity and correctness
  • Longest common subsequence
    • we have done this topic earlier when discussing overlapping subproblems in Week 4
    • plus, you have also covered this topic in COMP225/2010
    • so at this point, I really expect you to already understand LCSS

Definition

7.0 - Longest Common Subsequence

  • Given two sequences
            \(X = \{x_1,x_2,\dots\,x_m\}\) and \(Y= \{y_1,y_2,\dots,y_n\}\),
    find the longest common subsequence of \(X\) and \(Y\)
    • e.g. \(X = \{ k,i,t,t,e,n \}\)
             \(Y = \{ s,i,t,t,i,n,g \} \)
             the sequence \(\{ i,t,t,n \}\) is a solution
  • What is the brute-force solution?

Brute Force

7.0 - Longest Common Subsequence

  • Brute force solution:
    • Let \(X\) be a sequence of length \(m\) and \(Y\) be a sequence of length \(n\)
    • Generate all subsequences of \(X \rightarrow 2^m\)
    • Generate all subsequences of \(Y \rightarrow 2^n\)
    • For each subsequence of \(X\), compare it with a subsequence of \(Y\)
      • cost is \(2^m * 2^n = 2^{mn}\)
    • Hence complexity is \(O(2^{m+n})\)

How to Approach It

7.0 - Longest Common Subsequence

  • Example:
    • \(X = \{ k,i,t,t,e,n \}\)
    • \(Y = \{ s,i,t,t,i,n,g \} \)
  • Can you improve the brute force approach?
    • do you need to compare all these?
      • \(\{k,i,t,t\}\) with \(\{s,i,t,t\}\)
      • \(\{k,i,t,t\}\) with \(\{s,i,t,t,i\}\)
      • \(\{k,i,t,t\}\) with \(\{s,i,t,t,i,n\}\)
      • \(\{k,i,t,t\}\) with \(\{s,i,t,t,i,n,g\}\)

How to Approach It

7.0 - Longest Common Subsequence

  • Example:
    • \(X = \{ k,i,t,t,e,n \}\)
    • \(Y = \{ s,i,t,t,i,n,g \} \)
  • Let's get some intuition:
    • \(\{k,i,t,t\}\) with \(\{s,i,t,t\}\)
    • \(\{k,i,t,t\}\) with \(\{s,i,t,t,i\}\)
    • \(\{k,i,t,t\}\) with \(\{s,i,t,t,i,n\}\)
    • \(\{k,i,t,t\}\) with \(\{s,i,t,t,i,n,g\}\)
  • Why do we keep comparing \(k\)? Can we drop it?

How to Approach It

7.0 - Longest Common Subsequence

  • Example:
    • \(X = \{ k,i,t,t,e,n \}\)
    • \(Y = \{ s,i,t,t,i,n,g \} \)
  • Let's get some intuition:
    • \(\{i,t,t\}\) with \(\{s,i,t,t\}\)
    • \(\{i,t,t\}\) with \(\{s,i,t,t,i\}\)
    • \(\{i,t,t\}\) with \(\{s,i,t,t,i,n\}\)
    • \(\{i,t,t\}\) with \(\{s,i,t,t,i,n,g\}\)
  • Why do we keep comparing \(s\)? Can we drop it?

How to Approach It

7.0 - Longest Common Subsequence

  • Example:
    • \(X = \{ k,i,t,t,e,n \}\)
    • \(Y = \{ s,i,t,t,i,n,g \} \)
  • Let's get some intuition:
    • \(\{i,t,t\}\) with \(\{i,t,t\}\)
    • \(\{i,t,t\}\) with \(\{i,t,t,i\}\)
    • \(\{i,t,t\}\) with \(\{i,t,t,i,n\}\)
    • \(\{i,t,t\}\) with \(\{i,t,t,i,n,g\}\)
  • Why do we keep comparing \(i\)? Can we drop it?

How to Approach It

7.0 - Longest Common Subsequence

  • If the first characters of both string matches, then we should take the first character off from both strings (i.e. don't compare them again)
  • If they do not match, then
    • should we keep both?
      • then we'll never progress
    • should we take both first characters off?
      • no, why?
    • should we take one off from one
      • yes, because clearly it's not helping, but we have to do it to both strings

How to Approach It

7.0 - Longest Common Subsequence

kitten
sitting
itten
sitting
kitten
itting
itten
itting
tten
tting
ten
ting
en
ing

7.0 - Longest Common Subsequence

Recursive Relation

\(x_1\ x_2\ x_3 \dots x_m\)

\(y_1\ y_2\ y_3\dots y_n\)

\(x_2\ x_3 \dots x_m\)

\(y_1\ y_2\ y_3\dots y_n\)

\(x_1\ x_2\ x_3 \dots x_m\)

\(y_2\ y_3\dots y_n\)

\(x_2\ x_3 \dots x_m\)

\(y_2\ y_3\dots y_n\)

\(x_1 = y_1\)

\(x_1 \ne y_1\)

\(x_1 \ne y_1\)

7.0 - Longest Common Subsequence

Recursive Relation

\(x_1\ x_2\ x_3 \dots x_m\)

\(y_1\ y_2\ y_3\dots y_n\)

\(x_2\ x_3 \dots x_m\)

\(y_1\ y_2\ y_3\dots y_n\)

\(x_1\ x_2\ x_3 \dots x_m\)

\(y_2\ y_3\dots y_n\)

\(x_2\ x_3 \dots x_m\)

\(y_2\ y_3\dots y_n\)

\(x_1 = y_1\)

\(x_1 \ne y_1\)

\(x_1 \ne y_1\)

\[ \text{LCSS}(X,Y) = \begin{cases} 0 & \text{if \(X\) or \(Y\) is empty}\\1 + \text{LCSS}(X_2,Y_2) & \text{if $x_1 = y_1$}\\\max\left(\text{LCSS}(X_2,Y_1),\text{LCSS}(X_1,Y_2)\right) & \text{if $x_1 \ne y_1$}\end{cases} \] 

7.0 - Longest Common Subsequence

Overlapping Subproblems

\(x_1\ x_2\ x_3 \dots x_m\)

\(y_1\ y_2\ y_3\dots y_n\)

\( x_2\ x_3 \dots x_m\)

\(y_1\ y_2\ y_3\dots y_n\)

\(x_1\ x_2\ x_3 \dots x_m\)

\( y_2\ y_3\dots y_n\)

\( x_3 \dots x_m\)

\(y_1\ y_2\ y_3\dots y_n\)

\(x_2\ x_3 \dots x_m\)

\(y_2\ y_3\dots y_n\)

\(x_1\ x_2\ x_3 \dots x_m\)

\(y_3\dots y_n\)

\(x_2\ x_3 \dots x_m\)

\(y_2\ y_3\dots y_n\)

\( x_3 \dots x_m\)

\(y_2\ y_3\dots y_n\)

\( x_3 \dots x_m\)

\(y_2\ y_3\dots y_n\)

\( x_3 \dots x_m\)

\(y_2\ y_3\dots y_n\)

\(\dots\)

\(\dots\)

\(\dots\)

\(\dots\)

\(\dots\)

\(\dots\)

7.0 - Longest Common Subsequence

Overlapping Subproblems

\(x_1\ x_2\ x_3 \dots x_m\)

\(y_1\ y_2\ y_3\dots y_n\)

\( x_2\ x_3 \dots x_m\)

\(y_1\ y_2\ y_3\dots y_n\)

\(x_1\ x_2\ x_3 \dots x_m\)

\( y_2\ y_3\dots y_n\)

\( x_3 \dots x_m\)

\(y_1\ y_2\ y_3\dots y_n\)

\(x_2\ x_3 \dots x_m\)

\(y_2\ y_3\dots y_n\)

\(x_1\ x_2\ x_3 \dots x_m\)

\(y_3\dots y_n\)

\(x_2\ x_3 \dots x_m\)

\(y_2\ y_3\dots y_n\)

\( x_3 \dots x_m\)

\(y_2\ y_3\dots y_n\)

\( x_3 \dots x_m\)

\(y_2\ y_3\dots y_n\)

\( x_3 \dots x_m\)

\(y_2\ y_3\dots y_n\)

\(\dots\)

\(\dots\)

\(\dots\)

\(\dots\)

7.0 - Longest Common Subsequence

Overlapping Subproblems

\(x_1\ x_2\ x_3 \dots x_m\)

\(y_1\ y_2\ y_3\dots y_n\)

\( x_2\ x_3 \dots x_m\)

\(y_1\ y_2\ y_3\dots y_n\)

\(x_1\ x_2\ x_3 \dots x_m\)

\( y_2\ y_3\dots y_n\)

\( x_3 \dots x_m\)

\(y_1\ y_2\ y_3\dots y_n\)

\(x_2\ x_3 \dots x_m\)

\(y_2\ y_3\dots y_n\)

\(x_1\ x_2\ x_3 \dots x_m\)

\(y_3\dots y_n\)

\( x_3 \dots x_m\)

\(y_2\ y_3\dots y_n\)

\(\dots\)

\(\dots\)

\(\dots\)

\(\dots\)

7.0 - Longest Common Subsequence

Overlapping Subproblems

kitten
sitting
itten
sitting
kitten
itting
tten
sitting
itten
itting
kitten
tting

 

 

itten
tting
tten
itting
tten
tting

Optimal Substructure

7.0 - Longest Common Subsequence

  • Example:
    • \(X = \{ k,i,t,t,e,n \}\)
    • \(Y = \{ s,i,t,t,i,n,g \} \)
  • Does it have overlapping subproblems?
  • Does the longest common subsequence problem have an optimal substructure?

Optimal Substructure

7.0 - Longest Common Subsequence

  • \(X = \{ x_1, x_2, x_3, \dots, x_m \} \)
  • \(Y = \{ y_1, y_2, y_3, \dots, y_n \} \)
  • Let \(Z = \{ z_1, z_2, z_3, \dots, z_k \} \) be the LCSS of \(X\) and \(Y\)
  • If \(x_1 = y_1\), then \(Z\) should contain \(x_1 = y_1\), i.e. \(z_1 = x_1 = y_1\), and \(Z_2 = \{z_2,z_3,\dots,z_k\}\) is the LCSS of \(X_2\) and \(Y_2\)

Case A:

Case B:

  • If \(x_1 \ne y_1\), then \(Z\) is either
    • the LCSS of \(X_2 =\{x_2,x_3,\dots,x_m\}\), and \(Y=\{y_1,y_2,\dots,y_n\}\), or
    • the LCSS of \(X = \{x_1,x_2,\dots,x_m\}\) and \(Y_2 =\{y_2,y_3,\dots,y_n\}\)

Optimal Substructure

7.0 - Longest Common Subsequence

kitten
  • If \(x_1 \ne y_1\), then \(Z\) is either
    • the LCSS of \(X_2 =\{x_2,x_3,\dots,x_m\}\), and \(Y=\{y_1,y_2,\dots,y_n\}\), or
    • the LCSS of \(X = \{x_1,x_2,\dots,x_m\}\) and \(Y_2 =\{y_2,y_3,\dots,y_n\}\)

Case A:

sitting
itten
sitting
kitten
itting

Optimal Substructure

7.0 - Longest Common Subsequence

Proof (by contradiction):

  • Let \(Z\) be the LCSS of \(X\) and \(Y\)
    • if \(Z\) is NOT the LCSS of \(X_2\) and \(Y\), that means they have a longer common subsequence than \(Z\), say \(Z^*\).
  • Therefore \(Z^*\) is the LCSS of \(X\) and \(Y\), a contradiction!
  • Proof is symmetrical for the case \(X\) and \(Y_2\)
  • If \(x_1 \ne y_1\), then \(Z\) is either
    • the LCSS of \(X_2 =\{x_2,x_3,\dots,x_m\}\), and \(Y=\{y_1,y_2,\dots,y_n\}\), or
    • the LCSS of \(X = \{x_1,x_2,\dots,x_m\}\) and \(Y_2 =\{y_2,y_3,\dots,y_n\}\)

Case A:

Optimal Substructure

7.0 - Longest Common Subsequence

Case B:

  • If \(x_1 = y_1\), then \(Z\) should contain \(x_1 = y_1\), i.e. \(z_1 = x_1 = y_1\), and \(Z_2 = \{z_2,z_3,\dots,z_k\}\) is the LCSS of \(X_2\) and \(Y_2\)
itten
itting
tten
itting
itten
tting
tten
tting

Optimal Substructure

7.0 - Longest Common Subsequence

Proof (by contradiction):

  • if \(Z\) does not contain \(x_1\), then we can always append \(x_1\) to it, making a longer common subsequence, so \(Z\) MUST contain \(x_1\)
  • if \(Z_2\) is not the LCSS of \(X_2\) and \(Y_2\), then there is another common subsequence \(Z^*\) that is longer. If we append \(x_1\) to \(Z^*\), \(Z^*\) would be longer than \(Z\), a contradiction!

Case B:

  • If \(x_1 = y_1\), then \(Z\) should contain \(x_1 = y_1\), i.e. \(z_1 = x_1 = y_1\), and \(Z_2 = \{z_2,z_3,\dots,z_k\}\) is the LCSS of \(X_2\) and \(Y_2\)

Top Down Solution

7.0 - Longest Common Subsequence

kitten
sitting
itten
sitting
kitten
itting
tten
sitting
itten
itting
kitten
tting

 

 

itten
tting
tten
itting
tten
tting

7.0 - Longest Common Subsequence

Recursive Relation

\(x_1\ x_2\ x_3 \dots x_m\)

\(y_1\ y_2\ y_3\dots y_n\)

\(x_2\ x_3 \dots x_m\)

\(y_1\ y_2\ y_3\dots y_n\)

\(x_1\ x_2\ x_3 \dots x_m\)

\(y_2\ y_3\dots y_n\)

\(x_2\ x_3 \dots x_m\)

\(y_2\ y_3\dots y_n\)

\(x_1 = y_1\)

\(x_1 \ne y_1\)

\(x_1 \ne y_1\)

\[ \text{LCSS}(X,Y) = \begin{cases} 0 & \text{if \(X\) or \(Y\) is empty}\\1 + \text{LCSS}(X_2,Y_2) & \text{if $x_1 = y_1$}\\\max\left(\text{LCSS}(X_2,Y_1),\text{LCSS}(X_1,Y_2)\right) & \text{if $x_1 \ne y_1$}\end{cases} \] 

Top Down Solution

7.0 - Longest Common Subsequence

public static int lcss_top_down(String x, String y) {
  int i = x.length()-1, j = y.length()-1;
  if (x.length() == 0 || y.length() == 0)
	return 0;
  if (lcss[i][j] != -1) 
	return lcss[i][j];
  else if (x.charAt(0) == y.charAt(0))
	return lcss[i][j] = 1 + lcss_top_down(x.substring(1),y.substring(1));
  else 
	return lcss[i][j] = Math.max(lcss_top_down(x.substring(1),y), 
                                 lcss_top_down(x,y.substring(1)));
}

Bottom Up Solution

7.0 - Longest Common Subsequence

kitten
sitting
itten
sitting
kitten
itting
tten
sitting
itten
itting
kitten
tting

 

 

itten
tting
tten
itting
tten
tting

Bottom Up Solution

7.0 - Longest Common Subsequence

kitten
sitting
itten
sitting
kitten
itting
tten
sitting
itten
itting
kitten
tting
itten
tting
tten
itting
tten
tting

Bottom Up Solution

7.0 - Longest Common Subsequence

for (int i = 0; i < n+1; i++) lcss[m][i] = 0;
for (int i = 0; i < m+1; i++) lcss[i][n] = 0;
		
for (int i = m-1; i > -1; i--){
  for (int j = n-1; j > -1; j--){
    // if character matches, then go diagonally 
    if(a.charAt(i) == b.charAt(j))
      lcss[i][j] = 1 + lcss[i+1][j+1];
    // else, compare the cell to your right and to your bottom, 
    // and pick the larger one
    else
      lcss[i][j] = Integer.max(lcss[i][j+1], lcss[i+1][j]);
  }
}

Bottom Up Solution

7.0 - Longest Common Subsequence

sitting itting tting ting ing ng g
kitten 0
itten 0
tten 0
ten 0
en 0
n 0
0 0 0 0 0 0 0 0

Bottom Up Solution

7.0 - Longest Common Subsequence

sitting itting tting ting ing ng g
kitten 0 0
itten 0 0
tten 0 0
ten 0 0
en 0 0
n 1 1 1 1 1 1 0 0
0 0 0 0 0 0 0 0

Bottom Up Solution

7.0 - Longest Common Subsequence

sitting itting tting ting ing ng g
kitten 1 0 0
itten 1 0 0
tten 1 0 0
ten 1 0 0
en 1 1 1 1 1 1 0 0
n 1 1 1 1 1 1 0 0
0 0 0 0 0 0 0 0

Bottom Up Solution

7.0 - Longest Common Subsequence

sitting itting tting ting ing ng g
kitten 2 1 0 0
itten 2 1 0 0
tten 1 1 0 0
ten 1 1 0 0
en 1 1 1 1 1 1 0 0
n 1 1 1 1 1 1 0 0
0 0 0 0 0 0 0 0

Bottom Up Solution

7.0 - Longest Common Subsequence

sitting itting tting ting ing ng g
kitten 2 1 0 0
itten 2 1 0 0
tten 1 1 0 0
ten 2 2 2 2 1 1 0 0
en 1 1 1 1 1 1 0 0
n 1 1 1 1 1 1 0 0
0 0 0 0 0 0 0 0

Bottom Up Solution

7.0 - Longest Common Subsequence

sitting itting tting ting ing ng g
kitten 3 2 2 1 0 0
itten 3 2 2 1 0 0
tten 3 3 3 2 1 1 0 0
ten 2 2 2 2 1 1 0 0
en 1 1 1 1 1 1 0 0
n 1 1 1 1 1 1 0 0
0 0 0 0 0 0 0 0

Bottom Up Solution

7.0 - Longest Common Subsequence

sitting itting tting ting ing ng g
kitten 4 4 3 2 2 1 0 0
itten 4 4 3 2 2 1 0 0
tten 3 3 3 2 1 1 0 0
ten 2 2 2 2 1 1 0 0
en 1 1 1 1 1 1 0 0
n 1 1 1 1 1 1 0 0
0 0 0 0 0 0 0 0

COMP3010 - 7.0 - LCSS

By Daniel Sutantyo

COMP3010 - 7.0 - LCSS

  • 111