COMP3010: Algorithm Theory and Design
Daniel Sutantyo, Department of Computing, Macquarie University
7.0 - Longest Common Subsequence
Prelude
7.0 - Longest Common Subsequence
- In the first half of the semester, we concentrated on three topics:
- brute force \(\rightarrow\) dynamic programming \(\rightarrow\) greedy algorithm
- plus complexity and correctness
- Longest common subsequence
- we have done this topic earlier when discussing overlapping subproblems in Week 4
- plus, you have also covered this topic in COMP225/2010
- so at this point, I really expect you to already understand LCSS
Definition
7.0 - Longest Common Subsequence
- Given two sequences
\(X = \{x_1,x_2,\dots\,x_m\}\) and \(Y= \{y_1,y_2,\dots,y_n\}\),
find the longest common subsequence of \(X\) and \(Y\)- e.g. \(X = \{ k,i,t,t,e,n \}\)
\(Y = \{ s,i,t,t,i,n,g \} \)
the sequence \(\{ i,t,t,n \}\) is a solution
- e.g. \(X = \{ k,i,t,t,e,n \}\)
- What is the brute-force solution?
Brute Force
7.0 - Longest Common Subsequence
- Brute force solution:
- Let \(X\) be a sequence of length \(m\) and \(Y\) be a sequence of length \(n\)
- Generate all subsequences of \(X \rightarrow 2^m\)
- Generate all subsequences of \(Y \rightarrow 2^n\)
- For each subsequence of \(X\), compare it with a subsequence of \(Y\)
- cost is \(2^m * 2^n = 2^{mn}\)
- Hence complexity is \(O(2^{m+n})\)
How to Approach It
7.0 - Longest Common Subsequence
- Example:
- \(X = \{ k,i,t,t,e,n \}\)
- \(Y = \{ s,i,t,t,i,n,g \} \)
- Can you improve the brute force approach?
- do you need to compare all these?
- \(\{k,i,t,t\}\) with \(\{s,i,t,t\}\)
- \(\{k,i,t,t\}\) with \(\{s,i,t,t,i\}\)
- \(\{k,i,t,t\}\) with \(\{s,i,t,t,i,n\}\)
- \(\{k,i,t,t\}\) with \(\{s,i,t,t,i,n,g\}\)
- do you need to compare all these?
How to Approach It
7.0 - Longest Common Subsequence
-
Example:
- \(X = \{ k,i,t,t,e,n \}\)
- \(Y = \{ s,i,t,t,i,n,g \} \)
- Let's get some intuition:
- \(\{k,i,t,t\}\) with \(\{s,i,t,t\}\)
- \(\{k,i,t,t\}\) with \(\{s,i,t,t,i\}\)
- \(\{k,i,t,t\}\) with \(\{s,i,t,t,i,n\}\)
- \(\{k,i,t,t\}\) with \(\{s,i,t,t,i,n,g\}\)
- Why do we keep comparing \(k\)? Can we drop it?
How to Approach It
7.0 - Longest Common Subsequence
- Example:
- \(X = \{ k,i,t,t,e,n \}\)
- \(Y = \{ s,i,t,t,i,n,g \} \)
- Let's get some intuition:
- \(\{i,t,t\}\) with \(\{s,i,t,t\}\)
- \(\{i,t,t\}\) with \(\{s,i,t,t,i\}\)
- \(\{i,t,t\}\) with \(\{s,i,t,t,i,n\}\)
- \(\{i,t,t\}\) with \(\{s,i,t,t,i,n,g\}\)
- Why do we keep comparing \(s\)? Can we drop it?
How to Approach It
7.0 - Longest Common Subsequence
- Example:
- \(X = \{ k,i,t,t,e,n \}\)
- \(Y = \{ s,i,t,t,i,n,g \} \)
- Let's get some intuition:
- \(\{i,t,t\}\) with \(\{i,t,t\}\)
- \(\{i,t,t\}\) with \(\{i,t,t,i\}\)
- \(\{i,t,t\}\) with \(\{i,t,t,i,n\}\)
- \(\{i,t,t\}\) with \(\{i,t,t,i,n,g\}\)
- Why do we keep comparing \(i\)? Can we drop it?
How to Approach It
7.0 - Longest Common Subsequence
- If the first characters of both string matches, then we should take the first character off from both strings (i.e. don't compare them again)
- If they do not match, then
- should we keep both?
- then we'll never progress
- should we take both first characters off?
- no, why?
- should we take one off from one
- yes, because clearly it's not helping, but we have to do it to both strings
- should we keep both?
How to Approach It
7.0 - Longest Common Subsequence
kitten
sitting
itten
sitting
kitten
itting
itten
itting
tten
tting
ten
ting
en
ing
7.0 - Longest Common Subsequence
Recursive Relation
\(x_1\ x_2\ x_3 \dots x_m\)
\(y_1\ y_2\ y_3\dots y_n\)
\(x_2\ x_3 \dots x_m\)
\(y_1\ y_2\ y_3\dots y_n\)
\(x_1\ x_2\ x_3 \dots x_m\)
\(y_2\ y_3\dots y_n\)
\(x_2\ x_3 \dots x_m\)
\(y_2\ y_3\dots y_n\)
\(x_1 = y_1\)
\(x_1 \ne y_1\)
\(x_1 \ne y_1\)
7.0 - Longest Common Subsequence
Recursive Relation
\(x_1\ x_2\ x_3 \dots x_m\)
\(y_1\ y_2\ y_3\dots y_n\)
\(x_2\ x_3 \dots x_m\)
\(y_1\ y_2\ y_3\dots y_n\)
\(x_1\ x_2\ x_3 \dots x_m\)
\(y_2\ y_3\dots y_n\)
\(x_2\ x_3 \dots x_m\)
\(y_2\ y_3\dots y_n\)
\(x_1 = y_1\)
\(x_1 \ne y_1\)
\(x_1 \ne y_1\)
\[ \text{LCSS}(X,Y) = \begin{cases} 0 & \text{if \(X\) or \(Y\) is empty}\\1 + \text{LCSS}(X_2,Y_2) & \text{if $x_1 = y_1$}\\\max\left(\text{LCSS}(X_2,Y_1),\text{LCSS}(X_1,Y_2)\right) & \text{if $x_1 \ne y_1$}\end{cases} \]
7.0 - Longest Common Subsequence
Overlapping Subproblems
\(x_1\ x_2\ x_3 \dots x_m\)
\(y_1\ y_2\ y_3\dots y_n\)
\( x_2\ x_3 \dots x_m\)
\(y_1\ y_2\ y_3\dots y_n\)
\(x_1\ x_2\ x_3 \dots x_m\)
\( y_2\ y_3\dots y_n\)
\( x_3 \dots x_m\)
\(y_1\ y_2\ y_3\dots y_n\)
\(x_2\ x_3 \dots x_m\)
\(y_2\ y_3\dots y_n\)
\(x_1\ x_2\ x_3 \dots x_m\)
\(y_3\dots y_n\)
\(x_2\ x_3 \dots x_m\)
\(y_2\ y_3\dots y_n\)
\( x_3 \dots x_m\)
\(y_2\ y_3\dots y_n\)
\( x_3 \dots x_m\)
\(y_2\ y_3\dots y_n\)
\( x_3 \dots x_m\)
\(y_2\ y_3\dots y_n\)
\(\dots\)
\(\dots\)
\(\dots\)
\(\dots\)
\(\dots\)
\(\dots\)
7.0 - Longest Common Subsequence
Overlapping Subproblems
\(x_1\ x_2\ x_3 \dots x_m\)
\(y_1\ y_2\ y_3\dots y_n\)
\( x_2\ x_3 \dots x_m\)
\(y_1\ y_2\ y_3\dots y_n\)
\(x_1\ x_2\ x_3 \dots x_m\)
\( y_2\ y_3\dots y_n\)
\( x_3 \dots x_m\)
\(y_1\ y_2\ y_3\dots y_n\)
\(x_2\ x_3 \dots x_m\)
\(y_2\ y_3\dots y_n\)
\(x_1\ x_2\ x_3 \dots x_m\)
\(y_3\dots y_n\)
\(x_2\ x_3 \dots x_m\)
\(y_2\ y_3\dots y_n\)
\( x_3 \dots x_m\)
\(y_2\ y_3\dots y_n\)
\( x_3 \dots x_m\)
\(y_2\ y_3\dots y_n\)
\( x_3 \dots x_m\)
\(y_2\ y_3\dots y_n\)
\(\dots\)
\(\dots\)
\(\dots\)
\(\dots\)
7.0 - Longest Common Subsequence
Overlapping Subproblems
\(x_1\ x_2\ x_3 \dots x_m\)
\(y_1\ y_2\ y_3\dots y_n\)
\( x_2\ x_3 \dots x_m\)
\(y_1\ y_2\ y_3\dots y_n\)
\(x_1\ x_2\ x_3 \dots x_m\)
\( y_2\ y_3\dots y_n\)
\( x_3 \dots x_m\)
\(y_1\ y_2\ y_3\dots y_n\)
\(x_2\ x_3 \dots x_m\)
\(y_2\ y_3\dots y_n\)
\(x_1\ x_2\ x_3 \dots x_m\)
\(y_3\dots y_n\)
\( x_3 \dots x_m\)
\(y_2\ y_3\dots y_n\)
\(\dots\)
\(\dots\)
\(\dots\)
\(\dots\)
7.0 - Longest Common Subsequence
Overlapping Subproblems
kitten
sitting
itten
sitting
kitten
itting
tten
sitting
itten
itting
kitten
tting
itten
tting
tten
itting
tten
tting
Optimal Substructure
7.0 - Longest Common Subsequence
-
Example:
- \(X = \{ k,i,t,t,e,n \}\)
- \(Y = \{ s,i,t,t,i,n,g \} \)
- Does it have overlapping subproblems?
- Does the longest common subsequence problem have an optimal substructure?
Optimal Substructure
7.0 - Longest Common Subsequence
- \(X = \{ x_1, x_2, x_3, \dots, x_m \} \)
- \(Y = \{ y_1, y_2, y_3, \dots, y_n \} \)
- Let \(Z = \{ z_1, z_2, z_3, \dots, z_k \} \) be the LCSS of \(X\) and \(Y\)
- If \(x_1 = y_1\), then \(Z\) should contain \(x_1 = y_1\), i.e. \(z_1 = x_1 = y_1\), and \(Z_2 = \{z_2,z_3,\dots,z_k\}\) is the LCSS of \(X_2\) and \(Y_2\)
Case A:
Case B:
- If \(x_1 \ne y_1\), then \(Z\) is either
- the LCSS of \(X_2 =\{x_2,x_3,\dots,x_m\}\), and \(Y=\{y_1,y_2,\dots,y_n\}\), or
- the LCSS of \(X = \{x_1,x_2,\dots,x_m\}\) and \(Y_2 =\{y_2,y_3,\dots,y_n\}\)
Optimal Substructure
7.0 - Longest Common Subsequence
kitten
- If \(x_1 \ne y_1\), then \(Z\) is either
- the LCSS of \(X_2 =\{x_2,x_3,\dots,x_m\}\), and \(Y=\{y_1,y_2,\dots,y_n\}\), or
- the LCSS of \(X = \{x_1,x_2,\dots,x_m\}\) and \(Y_2 =\{y_2,y_3,\dots,y_n\}\)
Case A:
sitting
itten
sitting
kitten
itting
Optimal Substructure
7.0 - Longest Common Subsequence
Proof (by contradiction):
- Let \(Z\) be the LCSS of \(X\) and \(Y\)
- if \(Z\) is NOT the LCSS of \(X_2\) and \(Y\), that means they have a longer common subsequence than \(Z\), say \(Z^*\).
- Therefore \(Z^*\) is the LCSS of \(X\) and \(Y\), a contradiction!
- Proof is symmetrical for the case \(X\) and \(Y_2\)
- If \(x_1 \ne y_1\), then \(Z\) is either
- the LCSS of \(X_2 =\{x_2,x_3,\dots,x_m\}\), and \(Y=\{y_1,y_2,\dots,y_n\}\), or
- the LCSS of \(X = \{x_1,x_2,\dots,x_m\}\) and \(Y_2 =\{y_2,y_3,\dots,y_n\}\)
Case A:
Optimal Substructure
7.0 - Longest Common Subsequence
Case B:
- If \(x_1 = y_1\), then \(Z\) should contain \(x_1 = y_1\), i.e. \(z_1 = x_1 = y_1\), and \(Z_2 = \{z_2,z_3,\dots,z_k\}\) is the LCSS of \(X_2\) and \(Y_2\)
itten
itting
tten
itting
itten
tting
tten
tting
Optimal Substructure
7.0 - Longest Common Subsequence
Proof (by contradiction):
- if \(Z\) does not contain \(x_1\), then we can always append \(x_1\) to it, making a longer common subsequence, so \(Z\) MUST contain \(x_1\)
- if \(Z_2\) is not the LCSS of \(X_2\) and \(Y_2\), then there is another common subsequence \(Z^*\) that is longer. If we append \(x_1\) to \(Z^*\), \(Z^*\) would be longer than \(Z\), a contradiction!
Case B:
- If \(x_1 = y_1\), then \(Z\) should contain \(x_1 = y_1\), i.e. \(z_1 = x_1 = y_1\), and \(Z_2 = \{z_2,z_3,\dots,z_k\}\) is the LCSS of \(X_2\) and \(Y_2\)
Top Down Solution
7.0 - Longest Common Subsequence
kitten
sitting
itten
sitting
kitten
itting
tten
sitting
itten
itting
kitten
tting
itten
tting
tten
itting
tten
tting
7.0 - Longest Common Subsequence
Recursive Relation
\(x_1\ x_2\ x_3 \dots x_m\)
\(y_1\ y_2\ y_3\dots y_n\)
\(x_2\ x_3 \dots x_m\)
\(y_1\ y_2\ y_3\dots y_n\)
\(x_1\ x_2\ x_3 \dots x_m\)
\(y_2\ y_3\dots y_n\)
\(x_2\ x_3 \dots x_m\)
\(y_2\ y_3\dots y_n\)
\(x_1 = y_1\)
\(x_1 \ne y_1\)
\(x_1 \ne y_1\)
\[ \text{LCSS}(X,Y) = \begin{cases} 0 & \text{if \(X\) or \(Y\) is empty}\\1 + \text{LCSS}(X_2,Y_2) & \text{if $x_1 = y_1$}\\\max\left(\text{LCSS}(X_2,Y_1),\text{LCSS}(X_1,Y_2)\right) & \text{if $x_1 \ne y_1$}\end{cases} \]
Top Down Solution
7.0 - Longest Common Subsequence
public static int lcss_top_down(String x, String y) {
int i = x.length()-1, j = y.length()-1;
if (x.length() == 0 || y.length() == 0)
return 0;
if (lcss[i][j] != -1)
return lcss[i][j];
else if (x.charAt(0) == y.charAt(0))
return lcss[i][j] = 1 + lcss_top_down(x.substring(1),y.substring(1));
else
return lcss[i][j] = Math.max(lcss_top_down(x.substring(1),y),
lcss_top_down(x,y.substring(1)));
}
Bottom Up Solution
7.0 - Longest Common Subsequence
kitten
sitting
itten
sitting
kitten
itting
tten
sitting
itten
itting
kitten
tting
itten
tting
tten
itting
tten
tting
Bottom Up Solution
7.0 - Longest Common Subsequence
kitten
sitting
itten
sitting
kitten
itting
tten
sitting
itten
itting
kitten
tting
itten
tting
tten
itting
tten
tting
Bottom Up Solution
7.0 - Longest Common Subsequence
for (int i = 0; i < n+1; i++) lcss[m][i] = 0;
for (int i = 0; i < m+1; i++) lcss[i][n] = 0;
for (int i = m-1; i > -1; i--){
for (int j = n-1; j > -1; j--){
// if character matches, then go diagonally
if(a.charAt(i) == b.charAt(j))
lcss[i][j] = 1 + lcss[i+1][j+1];
// else, compare the cell to your right and to your bottom,
// and pick the larger one
else
lcss[i][j] = Integer.max(lcss[i][j+1], lcss[i+1][j]);
}
}
Bottom Up Solution
7.0 - Longest Common Subsequence
sitting | itting | tting | ting | ing | ng | g | ||
kitten | 0 | |||||||
itten | 0 | |||||||
tten | 0 | |||||||
ten | 0 | |||||||
en | 0 | |||||||
n | 0 | |||||||
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Bottom Up Solution
7.0 - Longest Common Subsequence
sitting | itting | tting | ting | ing | ng | g | ||
kitten | 0 | 0 | ||||||
itten | 0 | 0 | ||||||
tten | 0 | 0 | ||||||
ten | 0 | 0 | ||||||
en | 0 | 0 | ||||||
n | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Bottom Up Solution
7.0 - Longest Common Subsequence
sitting | itting | tting | ting | ing | ng | g | ||
kitten | 1 | 0 | 0 | |||||
itten | 1 | 0 | 0 | |||||
tten | 1 | 0 | 0 | |||||
ten | 1 | 0 | 0 | |||||
en | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 |
n | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Bottom Up Solution
7.0 - Longest Common Subsequence
sitting | itting | tting | ting | ing | ng | g | ||
kitten | 2 | 1 | 0 | 0 | ||||
itten | 2 | 1 | 0 | 0 | ||||
tten | 1 | 1 | 0 | 0 | ||||
ten | 1 | 1 | 0 | 0 | ||||
en | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 |
n | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Bottom Up Solution
7.0 - Longest Common Subsequence
sitting | itting | tting | ting | ing | ng | g | ||
kitten | 2 | 1 | 0 | 0 | ||||
itten | 2 | 1 | 0 | 0 | ||||
tten | 1 | 1 | 0 | 0 | ||||
ten | 2 | 2 | 2 | 2 | 1 | 1 | 0 | 0 |
en | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 |
n | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Bottom Up Solution
7.0 - Longest Common Subsequence
sitting | itting | tting | ting | ing | ng | g | ||
kitten | 3 | 2 | 2 | 1 | 0 | 0 | ||
itten | 3 | 2 | 2 | 1 | 0 | 0 | ||
tten | 3 | 3 | 3 | 2 | 1 | 1 | 0 | 0 |
ten | 2 | 2 | 2 | 2 | 1 | 1 | 0 | 0 |
en | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 |
n | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Bottom Up Solution
7.0 - Longest Common Subsequence
sitting | itting | tting | ting | ing | ng | g | ||
kitten | 4 | 4 | 3 | 2 | 2 | 1 | 0 | 0 |
itten | 4 | 4 | 3 | 2 | 2 | 1 | 0 | 0 |
tten | 3 | 3 | 3 | 2 | 1 | 1 | 0 | 0 |
ten | 2 | 2 | 2 | 2 | 1 | 1 | 0 | 0 |
en | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 |
n | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
COMP3010 - 7.0 - LCSS
By Daniel Sutantyo
COMP3010 - 7.0 - LCSS
- 122