End-to-End Deep Learning Model for Base Calling of MinION Nanopore Reads
Neven Miculinić
Associate prof. Mile Šikić, PhD
The University of Zagreb,
Faculty of electrical engineering and computing
MinION

Raw signal

Base pairs

Solution?
- Rule engine
- HMM
- Deep learning
Existing Solution
- (defunct) Metrichon
- Albacore
- Guppy
- Chiron
- ...
MinCall
- End2end
- GPU accelerated
- Deep learning model
- Uses well known
CNNs with CTC loss and beam search - Added autoencoder loss to speed up training
CTC loss

CTC loss

CTC loss

P(\pi | X) = \prod_{t=1}^{m} s_t(\pi_t)
P(π∣X)=∏t=1mst(πt)
Decoding

P(Y | X) = \sum_{\pi \in decode^{-1}(Y)}^{} P(\pi | X)
P(Y∣X)=∑π∈decode−1(Y)P(π∣X)
Greedy search

Beam search

Training detail
- Dataset
- Architecture
- Results
Dataset
Training dataset:
Jared's Simpsons R9.4 E.coli
Test dataset:
Ryan's Wick R9.4 Klebsiella pneumoniae
Preparation
- Basecalled with metrichon (positional data)
- Aligned with graphmap
- Corrected
- Transformed to protobuf
Preparation
syntax = "proto3";
package dataset;
enum BasePair {
A = 0;
C = 1;
G = 2;
T = 3;
BLANK = 4;
}
enum Cigar {
MATCH = 0;
MISMATCH = 1;
INSERTION = 2; // Insertion, soft clip, hard clip
DELETION = 3; // Deletion, N, P
}
message DataPoint {
message BPConfidenceInterval {
uint64 lower = 1;
uint64 upper = 2;
BasePair pair = 3;
}
repeated float signal = 1;
repeated BasePair basecalled = 2; // What we basecalled
repeated BPConfidenceInterval labels = 3; // labels describe corrected basecalled signal for training
}Preparation

Architecture

Read Results

Read Results

Read Results

Speed

Consensus Results

Consensus Results
| identity rate | |
|---|---|
| minion_b0 | 99.9671 |
| minion_b50 | 99.9604 |
| chiron_v0.3 | 99.9957 |
| albacore_v2.2.7 | 99.9904 |
| guppy_v0.5.1 | 99.9907 |
Questions?
minion
By Neven Miculinić
minion
- 573