End-to-End Deep Learning Model for Base Calling of MinION Nanopore Reads
Neven Miculinić
Associate prof. Mile Šikić, PhD
The University of Zagreb,
Faculty of electrical engineering and computing
MinION
Raw signal
Base pairs
Solution?
- Rule engine
- HMM
- Deep learning
Existing Solution
- (defunct) Metrichon
- Albacore
- Guppy
- Chiron
- ...
MinCall
- End2end
- GPU accelerated
- Deep learning model
- Uses well known
CNNs with CTC loss and beam search - Added autoencoder loss to speed up training
CTC loss
CTC loss
CTC loss
P(\pi | X) = \prod_{t=1}^{m} s_t(\pi_t)
P(π∣X)=∏t=1mst(πt)
Decoding
P(Y | X) = \sum_{\pi \in decode^{-1}(Y)}^{} P(\pi | X)
P(Y∣X)=∑π∈decode−1(Y)P(π∣X)
Greedy search
Beam search
Training detail
- Dataset
- Architecture
- Results
Dataset
Training dataset:
Jared's Simpsons R9.4 E.coli
Test dataset:
Ryan's Wick R9.4 Klebsiella pneumoniae
Preparation
- Basecalled with metrichon (positional data)
- Aligned with graphmap
- Corrected
- Transformed to protobuf
Preparation
syntax = "proto3";
package dataset;
enum BasePair {
A = 0;
C = 1;
G = 2;
T = 3;
BLANK = 4;
}
enum Cigar {
MATCH = 0;
MISMATCH = 1;
INSERTION = 2; // Insertion, soft clip, hard clip
DELETION = 3; // Deletion, N, P
}
message DataPoint {
message BPConfidenceInterval {
uint64 lower = 1;
uint64 upper = 2;
BasePair pair = 3;
}
repeated float signal = 1;
repeated BasePair basecalled = 2; // What we basecalled
repeated BPConfidenceInterval labels = 3; // labels describe corrected basecalled signal for training
}
Preparation
Architecture
Read Results
Read Results
Read Results
Speed
Consensus Results
Consensus Results
identity rate | |
---|---|
minion_b0 | 99.9671 |
minion_b50 | 99.9604 |
chiron_v0.3 | 99.9957 |
albacore_v2.2.7 | 99.9904 |
guppy_v0.5.1 | 99.9907 |
Questions?
minion
By Neven Miculinić
minion
- 363