Journal Club:
Fast and accurate short read alignment with BWT
SA and BWT
X = abracadabra$
$abracadabra a$abracadabr abra$abracad abracadabra$ acadabra$abr adabra$abrac bra$abracada bracadabra$a cadabra$abra dabra$abraca ra$abracadab racadabra$ab
sorted rotations
BWT
$ a$ abra$ abracadabra$ acadabra$ adabra$ bra$ bracadabra$ cadabra$ dabra$ ra$ racadabra$
$abracadabra a$abracadabr abra$abracad abracadabra$ acadabra$abr adabra$abrac bra$abracada bracadabra$a cadabra$abra dabra$abraca ra$abracadab racadabra$ab
a r d $ r c a a a a b b
BWT
Backward search
$ a$ abra$ abracadabra$ acadabra$ adabra$ bra$ bracadabra$ cadabra$ dabra$ ra$ racadabra$
a r d $ r c a a a a b b
BWT
Exact pattern matching in O(m):
= find interval of all suffixes for which pattern W is a prefix
- init: \(I=[s,e] = [1,n - 1]\)
- read pattern from right to left
- update:
\(I = [C(W_i)+O(W_i, s-1) + 1,C(W_i) + O(W_i, e)]\)
C(a):
# of smaller char occurences in X[0,n-2]
O(a, i):
# of a occurences in BWT[0,i]
X = abracadabra$, |X|=n=12
suffixes with \(W_i...\)
suffixes before prev end which are preceeded by W_i
suffixes before prev start which are preceeded by W_i
$ a$ abra$ abracadabra$ acadabra$ adabra$ bra$ bracadabra$ cadabra$ dabra$ ra$ racadabra$
a r d $ r c a a a a b b
BWT
W = ra
Exact pattern matching in O(m):
= find interval of all suffixes for which pattern W is a prefix
- init: \(I=[s,e] = [1,n - 1]\)
- read pattern from right to left
- update:
\(I = [C(W_i)+O(W_i, s-1) + 1,C(W_i) + O(W_i, e)]\)
C(a):
# of smaller char occurences in X[0,n-2]
O(a, i):
# of a occurences in BWT[0,i]
X = abracadabra$, |X|=n=12
Backward search
suffixes with \(W_i...\)
suffixes before prev end which are preceeded by W_i
suffixes before prev start which are preceeded by W_i
X = abracadabra$, |X|=n=12
C(a):
# of smaller char occurences in X[0,n-2]
O(a, i):
# of a occurences in BWT[0,i]
$ a$ abra$ abracadabra$ acadabra$ adabra$ bra$ bracadabra$ cadabra$ dabra$ ra$ racadabra$
a r d $ r c a a a a b b
BWT
W = ra
\(C(a)=0\)
\(O(a,0)=0\)
\(C(a) = 0\)
\(O(a,11) = 5\)
Exact pattern matching in O(m):
= find interval of all suffixes for which pattern W is a prefix
- init: \(I=[s,e] = [1,n - 1]\)
- read pattern from right to left
- update:
\(I = [C(W_i)+O(W_i, s-1) + 1,C(W_i) + O(W_i, e)]\)
\(=[1,5]\)
Backward search
suffixes with \(W_i...\)
suffixes before prev end which are preceeded by W_i
suffixes before prev start which are preceeded by W_i
$ a$ abra$ abracadabra$ acadabra$ adabra$ bra$ bracadabra$ cadabra$ dabra$ ra$ racadabra$
a r d $ r c a a a a b b
BWT
W = ra
Exact pattern matching in O(m):
= find interval of all suffixes for which pattern W is a prefix
- init: \(I=[s,e] = [1,n - 1]\)
- read pattern from right to left
- update:
\(I = [C(W_i)+O(W_i, s-1) + 1,C(W_i) + O(W_i, e)]\)
C(a):
# of smaller char occurences in X[0,n-2]
O(a, i):
# of a occurences in BWT[0,i]
X = abracadabra$, |X|=n=12
\(C(r)=9\)
\(O(r,0)=0\)
\(C(r) = 9\)
\(O(r,5) = 2\)
\(=[10,11]\)
suffixes with \(W_i...\)
suffixes before prev end which are preceeded by W_i
suffixes before prev start which are preceeded by W_i
Backward search
Journal Club:
By Johannes Köster
Journal Club:
- 513