String Matching
The KMP Algorithm

Terminologies

  • Sub-String: Zero or more consecutive characters
  • Prefix: A Sub-String that starts from the beginning of the string
  • Suffix: A Sub-String that ends at the end of the string
  • Proper-Prefix: A Prefix that's not the whole string
  • Proper-Suffix: A Suffix that's not the whole string

Example

"aaab"

  • Sub-Strings: {_, a, aa, aaa, aaab, a, aa, aab, a, ab, b}
  • Prefixes: {_, a, aa, aaa, aaab}
  • Suffixes: {_, b, ab, aab, aaab}
  • Proper-Prefixes: {_, a, aa, aaa}
  • Proper-Suffixes: {_, b, ab, aab}

Side Note:

  • #Substrings =
  • #Prefixes = #Suffixes = n + 1
{\frac {n * (n+1)} 2 + 1}

Given 2 strings (text, pattern).

Print all the positions of the pattern's occurrences within the text.

Example:

  ababab ab

Output:

  3

     1 3 5

Optimizations?

Failure (F) Function

The Longest Proper-Suffix that's also a Proper-Prefix

F[i] = 0 1 0 1 2 3

S    = a a b a a b

i            =    0     1    2    3    4    5

F[i] = 0 1 0 1 2 ?

S    = a a b a a x

i            =    0     1    2    3    4    5

F[i] = 0 1 0 1 2 ?

S    = a a b a a a

i            =    0     1    2    3    4    5

F[i] = 0 1 0 1 2 ?

S    = a a b a a b
i            =    0     1    2    3    4    5

F[i] = 0 1 0 1 2 0

S    = a a b a a x

i            =    0     1    2    3    4    5

F[i] = 0 1 0 1 2 2

S    = a a b a a a

i            =    0     1    2    3    4    5

Agenda

The KMP Algorithm

By Muhammad Magdi

The KMP Algorithm

  • 350