Journal Club:

Fast and accurate short read alignment with BWT

SA and BWT

X = abracadabra$
$abracadabra
a$abracadabr
abra$abracad
abracadabra$
acadabra$abr
adabra$abrac
bra$abracada
bracadabra$a
cadabra$abra
dabra$abraca
ra$abracadab
racadabra$ab

sorted rotations

BWT

$
a$
abra$
abracadabra$
acadabra$
adabra$
bra$
bracadabra$
cadabra$
dabra$
ra$
racadabra$
$abracadabra
a$abracadabr
abra$abracad
abracadabra$
acadabra$abr
adabra$abrac
bra$abracada
bracadabra$a
cadabra$abra
dabra$abraca
ra$abracadab
racadabra$ab
a
r
d
$
r
c
a
a
a
a
b
b

BWT

Backward search

$
a$
abra$
abracadabra$
acadabra$
adabra$
bra$
bracadabra$
cadabra$
dabra$
ra$
racadabra$
a
r
d
$
r
c
a
a
a
a
b
b

BWT

Exact pattern matching in O(m):

= find interval of all suffixes for which pattern W is a prefix

  • init: \(I=[s,e] = [1,n - 1]\)
  • read pattern from right to left
  • update:
    \(I = [C(W_i)+O(W_i, s-1) + 1,C(W_i) + O(W_i, e)]\)

C(a):

# of smaller char occurences in X[0,n-2]

O(a, i):

# of a occurences in BWT[0,i]

X = abracadabra$, |X|=n=12

suffixes with \(W_i...\)

suffixes before prev end which are preceeded by W_i

suffixes before prev start which are preceeded by W_i

$
a$
abra$
abracadabra$
acadabra$
adabra$
bra$
bracadabra$
cadabra$
dabra$
ra$
racadabra$
a
r
d
$
r
c
a
a
a
a
b
b

BWT

W = ra

Exact pattern matching in O(m):

= find interval of all suffixes for which pattern W is a prefix

  • init: \(I=[s,e] = [1,n - 1]\)
  • read pattern from right to left
  • update:
    \(I = [C(W_i)+O(W_i, s-1) + 1,C(W_i) + O(W_i, e)]\)

C(a):

# of smaller char occurences in X[0,n-2]

O(a, i):

# of a occurences in BWT[0,i]

X = abracadabra$, |X|=n=12

Backward search

suffixes with \(W_i...\)

suffixes before prev end which are preceeded by W_i

suffixes before prev start which are preceeded by W_i

X = abracadabra$, |X|=n=12

C(a):

# of smaller char occurences in X[0,n-2]

O(a, i):

# of a occurences in BWT[0,i]

$
a$
abra$
abracadabra$
acadabra$
adabra$
bra$
bracadabra$
cadabra$
dabra$
ra$
racadabra$
a
r
d
$
r
c
a
a
a
a
b
b

BWT

W = ra

\(C(a)=0\)

\(O(a,0)=0\)

\(C(a) = 0\)

\(O(a,11) = 5\)

Exact pattern matching in O(m):

= find interval of all suffixes for which pattern W is a prefix

  • init: \(I=[s,e] = [1,n - 1]\)
  • read pattern from right to left
  • update:
    \(I = [C(W_i)+O(W_i, s-1) + 1,C(W_i) + O(W_i, e)]\)

\(=[1,5]\)

Backward search

suffixes with \(W_i...\)

suffixes before prev end which are preceeded by W_i

suffixes before prev start which are preceeded by W_i

$
a$
abra$
abracadabra$
acadabra$
adabra$
bra$
bracadabra$
cadabra$
dabra$
ra$
racadabra$
a
r
d
$
r
c
a
a
a
a
b
b

BWT

W = ra

Exact pattern matching in O(m):

= find interval of all suffixes for which pattern W is a prefix

  • init: \(I=[s,e] = [1,n - 1]\)
  • read pattern from right to left
  • update:
    \(I = [C(W_i)+O(W_i, s-1) + 1,C(W_i) + O(W_i, e)]\)

C(a):

# of smaller char occurences in X[0,n-2]

O(a, i):

# of a occurences in BWT[0,i]

X = abracadabra$, |X|=n=12

\(C(r)=9\)

\(O(r,0)=0\)

\(C(r) = 9\)

\(O(r,5) = 2\)

\(=[10,11]\)

suffixes with \(W_i...\)

suffixes before prev end which are preceeded by W_i

suffixes before prev start which are preceeded by W_i

Backward search