String Matching
The KMP Algorithm
Terminologies
- Sub-String: Zero or more consecutive characters
- Prefix: A Sub-String that starts from the beginning of the string
- Suffix: A Sub-String that ends at the end of the string
- Proper-Prefix: A Prefix that's not the whole string
- Proper-Suffix: A Suffix that's not the whole string
Example
"aaab"
- Sub-Strings: {_, a, aa, aaa, aaab, a, aa, aab, a, ab, b}
- Prefixes: {_, a, aa, aaa, aaab}
- Suffixes: {_, b, ab, aab, aaab}
- Proper-Prefixes: {_, a, aa, aaa}
- Proper-Suffixes: {_, b, ab, aab}
Side Note:
- #Substrings =
- #Prefixes = #Suffixes = n + 1
{\frac {n * (n+1)} 2 + 1}
Given 2 strings (text, pattern).
Print all the positions of the pattern's occurrences within the text.
Example:
ababab ab
Output:
3
1 3 5
Optimizations?
Failure (F) Function
The Longest Proper-Suffix that's also a Proper-Prefix
F[i] = 0 1 0 1 2 3
S = a a b a a b
i = 0 1 2 3 4 5
F[i] = 0 1 0 1 2 ?
S = a a b a a x
i = 0 1 2 3 4 5
F[i] = 0 1 0 1 2 ?
S = a a b a a a
i = 0 1 2 3 4 5
F[i] = 0 1 0 1 2 ?
S = a a b a a b
i = 0 1 2 3 4 5
F[i] = 0 1 0 1 2 0
S = a a b a a x
i = 0 1 2 3 4 5
F[i] = 0 1 0 1 2 2
S = a a b a a a
i = 0 1 2 3 4 5
Agenda
- String Terminology
- A Needle in the haystack Problem
- Naive Solution
- Optimizations ?
- KMP Algorithm
- Failure Function
The KMP Algorithm
By Muhammad Magdi
The KMP Algorithm
- 350