COMP3010: Algorithm Theory and Design

Daniel Sutantyo, Department of Computing, Macquarie University

7.0 - Longest Common Subsequence

Prelude

7.0 - Longest Common Subsequence

  • In the first half of the semester, we concentrated on three topics:
    • brute force \rightarrow dynamic programming \rightarrow greedy algorithm
    • plus complexity and correctness
  • Longest common subsequence
    • we have done this topic earlier when discussing overlapping subproblems in Week 4
    • plus, you have also covered this topic in COMP225/2010
    • so at this point, I really expect you to already understand LCSS

Definition

7.0 - Longest Common Subsequence

  • Given two sequences
            X={x1,x2,xm}X = \{x_1,x_2,\dots\,x_m\} and Y={y1,y2,,yn}Y= \{y_1,y_2,\dots,y_n\},
    find the longest common subsequence of XX and YY
    • e.g. X={k,i,t,t,e,n}X = \{ k,i,t,t,e,n \}
             Y={s,i,t,t,i,n,g}Y = \{ s,i,t,t,i,n,g \}
             the sequence {i,t,t,n}\{ i,t,t,n \} is a solution
  • What is the brute-force solution?

Brute Force

7.0 - Longest Common Subsequence

  • Brute force solution:
    • Let XX be a sequence of length mm and YY be a sequence of length nn
    • Generate all subsequences of X2mX \rightarrow 2^m
    • Generate all subsequences of Y2nY \rightarrow 2^n
    • For each subsequence of XX, compare it with a subsequence of YY
      • cost is 2m2n=2mn2^m * 2^n = 2^{mn}
    • Hence complexity is O(2m+n)O(2^{m+n})

How to Approach It

7.0 - Longest Common Subsequence

  • Example:
    • X={k,i,t,t,e,n}X = \{ k,i,t,t,e,n \}
    • Y={s,i,t,t,i,n,g}Y = \{ s,i,t,t,i,n,g \}
  • Can you improve the brute force approach?
    • do you need to compare all these?
      • {k,i,t,t}\{k,i,t,t\} with {s,i,t,t}\{s,i,t,t\}
      • {k,i,t,t}\{k,i,t,t\} with {s,i,t,t,i}\{s,i,t,t,i\}
      • {k,i,t,t}\{k,i,t,t\} with {s,i,t,t,i,n}\{s,i,t,t,i,n\}
      • {k,i,t,t}\{k,i,t,t\} with {s,i,t,t,i,n,g}\{s,i,t,t,i,n,g\}

How to Approach It

7.0 - Longest Common Subsequence

  • Example:
    • X={k,i,t,t,e,n}X = \{ k,i,t,t,e,n \}
    • Y={s,i,t,t,i,n,g}Y = \{ s,i,t,t,i,n,g \}
  • Let's get some intuition:
    • {k,i,t,t}\{k,i,t,t\} with {s,i,t,t}\{s,i,t,t\}
    • {k,i,t,t}\{k,i,t,t\} with {s,i,t,t,i}\{s,i,t,t,i\}
    • {k,i,t,t}\{k,i,t,t\} with {s,i,t,t,i,n}\{s,i,t,t,i,n\}
    • {k,i,t,t}\{k,i,t,t\} with {s,i,t,t,i,n,g}\{s,i,t,t,i,n,g\}
  • Why do we keep comparing kk? Can we drop it?

How to Approach It

7.0 - Longest Common Subsequence

  • Example:
    • X={k,i,t,t,e,n}X = \{ k,i,t,t,e,n \}
    • Y={s,i,t,t,i,n,g}Y = \{ s,i,t,t,i,n,g \}
  • Let's get some intuition:
    • {i,t,t}\{i,t,t\} with {s,i,t,t}\{s,i,t,t\}
    • {i,t,t}\{i,t,t\} with {s,i,t,t,i}\{s,i,t,t,i\}
    • {i,t,t}\{i,t,t\} with {s,i,t,t,i,n}\{s,i,t,t,i,n\}
    • {i,t,t}\{i,t,t\} with {s,i,t,t,i,n,g}\{s,i,t,t,i,n,g\}
  • Why do we keep comparing ss? Can we drop it?

How to Approach It

7.0 - Longest Common Subsequence

  • Example:
    • X={k,i,t,t,e,n}X = \{ k,i,t,t,e,n \}
    • Y={s,i,t,t,i,n,g}Y = \{ s,i,t,t,i,n,g \}
  • Let's get some intuition:
    • {i,t,t}\{i,t,t\} with {i,t,t}\{i,t,t\}
    • {i,t,t}\{i,t,t\} with {i,t,t,i}\{i,t,t,i\}
    • {i,t,t}\{i,t,t\} with {i,t,t,i,n}\{i,t,t,i,n\}
    • {i,t,t}\{i,t,t\} with {i,t,t,i,n,g}\{i,t,t,i,n,g\}
  • Why do we keep comparing ii? Can we drop it?

How to Approach It

7.0 - Longest Common Subsequence

  • If the first characters of both string matches, then we should take the first character off from both strings (i.e. don't compare them again)
  • If they do not match, then
    • should we keep both?
      • then we'll never progress
    • should we take both first characters off?
      • no, why?
    • should we take one off from one
      • yes, because clearly it's not helping, but we have to do it to both strings

How to Approach It

7.0 - Longest Common Subsequence

kitten
sitting
itten
sitting
kitten
itting
itten
itting
tten
tting
ten
ting
en
ing

7.0 - Longest Common Subsequence

Recursive Relation

x1 x2 x3xmx_1\ x_2\ x_3 \dots x_m

y1 y2 y3yny_1\ y_2\ y_3\dots y_n

x2 x3xmx_2\ x_3 \dots x_m

y1 y2 y3yny_1\ y_2\ y_3\dots y_n

x1 x2 x3xmx_1\ x_2\ x_3 \dots x_m

y2 y3yny_2\ y_3\dots y_n

x2 x3xmx_2\ x_3 \dots x_m

y2 y3yny_2\ y_3\dots y_n

x1=y1x_1 = y_1

x1y1x_1 \ne y_1

x1y1x_1 \ne y_1

7.0 - Longest Common Subsequence

Recursive Relation

x1 x2 x3xmx_1\ x_2\ x_3 \dots x_m

y1 y2 y3yny_1\ y_2\ y_3\dots y_n

x2 x3xmx_2\ x_3 \dots x_m

y1 y2 y3yny_1\ y_2\ y_3\dots y_n

x1 x2 x3xmx_1\ x_2\ x_3 \dots x_m

y2 y3yny_2\ y_3\dots y_n

x2 x3xmx_2\ x_3 \dots x_m

y2 y3yny_2\ y_3\dots y_n

x1=y1x_1 = y_1

x1y1x_1 \ne y_1

x1y1x_1 \ne y_1

LCSS(X,Y)={0if X or Y is empty1+LCSS(X2,Y2)if x1=y1max(LCSS(X2,Y1),LCSS(X1,Y2))if x1y1 \text{LCSS}(X,Y) = \begin{cases} 0 & \text{if \(X\) or \(Y\) is empty}\\1 + \text{LCSS}(X_2,Y_2) & \text{if $x_1 = y_1$}\\\max\left(\text{LCSS}(X_2,Y_1),\text{LCSS}(X_1,Y_2)\right) & \text{if $x_1 \ne y_1$}\end{cases}  

7.0 - Longest Common Subsequence

Overlapping Subproblems

x1 x2 x3xmx_1\ x_2\ x_3 \dots x_m

y1 y2 y3yny_1\ y_2\ y_3\dots y_n

x2 x3xm x_2\ x_3 \dots x_m

y1 y2 y3yny_1\ y_2\ y_3\dots y_n

x1 x2 x3xmx_1\ x_2\ x_3 \dots x_m

y2 y3yn y_2\ y_3\dots y_n

x3xm x_3 \dots x_m

y1 y2 y3yny_1\ y_2\ y_3\dots y_n

x2 x3xmx_2\ x_3 \dots x_m

y2 y3yny_2\ y_3\dots y_n

x1 x2 x3xmx_1\ x_2\ x_3 \dots x_m

y3yny_3\dots y_n

x2 x3xmx_2\ x_3 \dots x_m

y2 y3yny_2\ y_3\dots y_n

x3xm x_3 \dots x_m

y2 y3yny_2\ y_3\dots y_n

x3xm x_3 \dots x_m

y2 y3yny_2\ y_3\dots y_n

x3xm x_3 \dots x_m

y2 y3yny_2\ y_3\dots y_n

\dots

\dots

\dots

\dots

\dots

\dots

7.0 - Longest Common Subsequence

Overlapping Subproblems

x1 x2 x3xmx_1\ x_2\ x_3 \dots x_m

y1 y2 y3yny_1\ y_2\ y_3\dots y_n

x2 x3xm x_2\ x_3 \dots x_m

y1 y2 y3yny_1\ y_2\ y_3\dots y_n

x1 x2 x3xmx_1\ x_2\ x_3 \dots x_m

y2 y3yn y_2\ y_3\dots y_n

x3xm x_3 \dots x_m

y1 y2 y3yny_1\ y_2\ y_3\dots y_n

x2 x3xmx_2\ x_3 \dots x_m

y2 y3yny_2\ y_3\dots y_n

x1 x2 x3xmx_1\ x_2\ x_3 \dots x_m

y3yny_3\dots y_n

x2 x3xmx_2\ x_3 \dots x_m

y2 y3yny_2\ y_3\dots y_n

x3xm x_3 \dots x_m

y2 y3yny_2\ y_3\dots y_n

x3xm x_3 \dots x_m

y2 y3yny_2\ y_3\dots y_n

x3xm x_3 \dots x_m

y2 y3yny_2\ y_3\dots y_n

\dots

\dots

\dots

\dots

7.0 - Longest Common Subsequence

Overlapping Subproblems

x1 x2 x3xmx_1\ x_2\ x_3 \dots x_m

y1 y2 y3yny_1\ y_2\ y_3\dots y_n

x2 x3xm x_2\ x_3 \dots x_m

y1 y2 y3yny_1\ y_2\ y_3\dots y_n

x1 x2 x3xmx_1\ x_2\ x_3 \dots x_m

y2 y3yn y_2\ y_3\dots y_n

x3xm x_3 \dots x_m

y1 y2 y3yny_1\ y_2\ y_3\dots y_n

x2 x3xmx_2\ x_3 \dots x_m

y2 y3yny_2\ y_3\dots y_n

x1 x2 x3xmx_1\ x_2\ x_3 \dots x_m

y3yny_3\dots y_n

x3xm x_3 \dots x_m

y2 y3yny_2\ y_3\dots y_n

\dots

\dots

\dots

\dots

7.0 - Longest Common Subsequence

Overlapping Subproblems

kitten
sitting
itten
sitting
kitten
itting
tten
sitting
itten
itting
kitten
tting

 

 

itten
tting
tten
itting
tten
tting

Optimal Substructure

7.0 - Longest Common Subsequence

  • Example:
    • X={k,i,t,t,e,n}X = \{ k,i,t,t,e,n \}
    • Y={s,i,t,t,i,n,g}Y = \{ s,i,t,t,i,n,g \}
  • Does it have overlapping subproblems?
  • Does the longest common subsequence problem have an optimal substructure?

Optimal Substructure

7.0 - Longest Common Subsequence

  • X={x1,x2,x3,,xm}X = \{ x_1, x_2, x_3, \dots, x_m \}
  • Y={y1,y2,y3,,yn}Y = \{ y_1, y_2, y_3, \dots, y_n \}
  • Let Z={z1,z2,z3,,zk}Z = \{ z_1, z_2, z_3, \dots, z_k \} be the LCSS of XX and YY
  • If x1=y1x_1 = y_1, then ZZ should contain x1=y1x_1 = y_1, i.e. z1=x1=y1z_1 = x_1 = y_1, and Z2={z2,z3,,zk}Z_2 = \{z_2,z_3,\dots,z_k\} is the LCSS of X2X_2 and Y2Y_2

Case A:

Case B:

  • If x1y1x_1 \ne y_1, then ZZ is either
    • the LCSS of X2={x2,x3,,xm}X_2 =\{x_2,x_3,\dots,x_m\}, and Y={y1,y2,,yn}Y=\{y_1,y_2,\dots,y_n\}, or
    • the LCSS of X={x1,x2,,xm}X = \{x_1,x_2,\dots,x_m\} and Y2={y2,y3,,yn}Y_2 =\{y_2,y_3,\dots,y_n\}

Optimal Substructure

7.0 - Longest Common Subsequence

kitten
  • If x1y1x_1 \ne y_1, then ZZ is either
    • the LCSS of X2={x2,x3,,xm}X_2 =\{x_2,x_3,\dots,x_m\}, and Y={y1,y2,,yn}Y=\{y_1,y_2,\dots,y_n\}, or
    • the LCSS of X={x1,x2,,xm}X = \{x_1,x_2,\dots,x_m\} and Y2={y2,y3,,yn}Y_2 =\{y_2,y_3,\dots,y_n\}

Case A:

sitting
itten
sitting
kitten
itting

Optimal Substructure

7.0 - Longest Common Subsequence

Proof (by contradiction):

  • Let ZZ be the LCSS of XX and YY
    • if ZZ is NOT the LCSS of X2X_2 and YY, that means they have a longer common subsequence than ZZ, say ZZ^*.
  • Therefore ZZ^* is the LCSS of XX and YY, a contradiction!
  • Proof is symmetrical for the case XX and Y2Y_2
  • If x1y1x_1 \ne y_1, then ZZ is either
    • the LCSS of X2={x2,x3,,xm}X_2 =\{x_2,x_3,\dots,x_m\}, and Y={y1,y2,,yn}Y=\{y_1,y_2,\dots,y_n\}, or
    • the LCSS of X={x1,x2,,xm}X = \{x_1,x_2,\dots,x_m\} and Y2={y2,y3,,yn}Y_2 =\{y_2,y_3,\dots,y_n\}

Case A:

Optimal Substructure

7.0 - Longest Common Subsequence

Case B:

  • If x1=y1x_1 = y_1, then ZZ should contain x1=y1x_1 = y_1, i.e. z1=x1=y1z_1 = x_1 = y_1, and Z2={z2,z3,,zk}Z_2 = \{z_2,z_3,\dots,z_k\} is the LCSS of X2X_2 and Y2Y_2
itten
itting
tten
itting
itten
tting
tten
tting

Optimal Substructure

7.0 - Longest Common Subsequence

Proof (by contradiction):

  • if ZZ does not contain x1x_1, then we can always append x1x_1 to it, making a longer common subsequence, so ZZ MUST contain x1x_1
  • if Z2Z_2 is not the LCSS of X2X_2 and Y2Y_2, then there is another common subsequence ZZ^* that is longer. If we append x1x_1 to ZZ^*, ZZ^* would be longer than ZZ, a contradiction!

Case B:

  • If x1=y1x_1 = y_1, then ZZ should contain x1=y1x_1 = y_1, i.e. z1=x1=y1z_1 = x_1 = y_1, and Z2={z2,z3,,zk}Z_2 = \{z_2,z_3,\dots,z_k\} is the LCSS of X2X_2 and Y2Y_2

Top Down Solution

7.0 - Longest Common Subsequence

kitten
sitting
itten
sitting
kitten
itting
tten
sitting
itten
itting
kitten
tting

 

 

itten
tting
tten
itting
tten
tting

7.0 - Longest Common Subsequence

Recursive Relation

x1 x2 x3xmx_1\ x_2\ x_3 \dots x_m

y1 y2 y3yny_1\ y_2\ y_3\dots y_n

x2 x3xmx_2\ x_3 \dots x_m

y1 y2 y3yny_1\ y_2\ y_3\dots y_n

x1 x2 x3xmx_1\ x_2\ x_3 \dots x_m

y2 y3yny_2\ y_3\dots y_n

x2 x3xmx_2\ x_3 \dots x_m

y2 y3yny_2\ y_3\dots y_n

x1=y1x_1 = y_1

x1y1x_1 \ne y_1

x1y1x_1 \ne y_1

LCSS(X,Y)={0if X or Y is empty1+LCSS(X2,Y2)if x1=y1max(LCSS(X2,Y1),LCSS(X1,Y2))if x1y1 \text{LCSS}(X,Y) = \begin{cases} 0 & \text{if \(X\) or \(Y\) is empty}\\1 + \text{LCSS}(X_2,Y_2) & \text{if $x_1 = y_1$}\\\max\left(\text{LCSS}(X_2,Y_1),\text{LCSS}(X_1,Y_2)\right) & \text{if $x_1 \ne y_1$}\end{cases}  

Top Down Solution

7.0 - Longest Common Subsequence

public static int lcss_top_down(String x, String y) {
  int i = x.length()-1, j = y.length()-1;
  if (x.length() == 0 || y.length() == 0)
	return 0;
  if (lcss[i][j] != -1) 
	return lcss[i][j];
  else if (x.charAt(0) == y.charAt(0))
	return lcss[i][j] = 1 + lcss_top_down(x.substring(1),y.substring(1));
  else 
	return lcss[i][j] = Math.max(lcss_top_down(x.substring(1),y), 
                                 lcss_top_down(x,y.substring(1)));
}

Bottom Up Solution

7.0 - Longest Common Subsequence

kitten
sitting
itten
sitting
kitten
itting
tten
sitting
itten
itting
kitten
tting

 

 

itten
tting
tten
itting
tten
tting

Bottom Up Solution

7.0 - Longest Common Subsequence

kitten
sitting
itten
sitting
kitten
itting
tten
sitting
itten
itting
kitten
tting
itten
tting
tten
itting
tten
tting

Bottom Up Solution

7.0 - Longest Common Subsequence

for (int i = 0; i < n+1; i++) lcss[m][i] = 0;
for (int i = 0; i < m+1; i++) lcss[i][n] = 0;
		
for (int i = m-1; i > -1; i--){
  for (int j = n-1; j > -1; j--){
    // if character matches, then go diagonally 
    if(a.charAt(i) == b.charAt(j))
      lcss[i][j] = 1 + lcss[i+1][j+1];
    // else, compare the cell to your right and to your bottom, 
    // and pick the larger one
    else
      lcss[i][j] = Integer.max(lcss[i][j+1], lcss[i+1][j]);
  }
}

Bottom Up Solution

7.0 - Longest Common Subsequence

sitting itting tting ting ing ng g
kitten 0
itten 0
tten 0
ten 0
en 0
n 0
0 0 0 0 0 0 0 0

Bottom Up Solution

7.0 - Longest Common Subsequence

sitting itting tting ting ing ng g
kitten 0 0
itten 0 0
tten 0 0
ten 0 0
en 0 0
n 1 1 1 1 1 1 0 0
0 0 0 0 0 0 0 0

Bottom Up Solution

7.0 - Longest Common Subsequence

sitting itting tting ting ing ng g
kitten 1 0 0
itten 1 0 0
tten 1 0 0
ten 1 0 0
en 1 1 1 1 1 1 0 0
n 1 1 1 1 1 1 0 0
0 0 0 0 0 0 0 0

Bottom Up Solution

7.0 - Longest Common Subsequence

sitting itting tting ting ing ng g
kitten 2 1 0 0
itten 2 1 0 0
tten 1 1 0 0
ten 1 1 0 0
en 1 1 1 1 1 1 0 0
n 1 1 1 1 1 1 0 0
0 0 0 0 0 0 0 0

Bottom Up Solution

7.0 - Longest Common Subsequence

sitting itting tting ting ing ng g
kitten 2 1 0 0
itten 2 1 0 0
tten 1 1 0 0
ten 2 2 2 2 1 1 0 0
en 1 1 1 1 1 1 0 0
n 1 1 1 1 1 1 0 0
0 0 0 0 0 0 0 0

Bottom Up Solution

7.0 - Longest Common Subsequence

sitting itting tting ting ing ng g
kitten 3 2 2 1 0 0
itten 3 2 2 1 0 0
tten 3 3 3 2 1 1 0 0
ten 2 2 2 2 1 1 0 0
en 1 1 1 1 1 1 0 0
n 1 1 1 1 1 1 0 0
0 0 0 0 0 0 0 0

Bottom Up Solution

7.0 - Longest Common Subsequence

sitting itting tting ting ing ng g
kitten 4 4 3 2 2 1 0 0
itten 4 4 3 2 2 1 0 0
tten 3 3 3 2 1 1 0 0
ten 2 2 2 2 1 1 0 0
en 1 1 1 1 1 1 0 0
n 1 1 1 1 1 1 0 0
0 0 0 0 0 0 0 0