MinION

Raw signal

Base pairs

Solution?

Rule engine
HMM
Deep learning

Existing Solution

(defunct) Metrichon
Albacore
Guppy
Chiron
...

MinCall

End2end
GPU accelerated
Deep learning model
Uses well known CNNs with CTC loss and beam search
Added autoencoder loss to speed up training

CTC loss

P(\pi | X) = \prod_{t=1}^{m} s_t(\pi_t)

P(\pi | X) = \prod_{t=1}^{m} s_t(\pi_t)

Decoding

P(Y | X) = \sum_{\pi \in decode^{-1}(Y)}^{} P(\pi | X)

P(Y | X) = \sum_{\pi \in decode^{-1}(Y)}^{} P(\pi | X)

Greedy search

Beam search

Training detail

Dataset
Architecture
Results

Dataset

Training dataset:

Jared's Simpsons R9.4 E.coli

Test dataset:

Ryan's Wick R9.4 Klebsiella pneumoniae

Preparation

Basecalled with metrichon (positional data)
Aligned with graphmap
Corrected
Transformed to protobuf

Preparation

syntax = "proto3";

package dataset;

enum BasePair {
    A = 0;
    C = 1;
    G = 2;
    T = 3;
    BLANK = 4;
}

enum Cigar {
    MATCH = 0;
    MISMATCH = 1;
    INSERTION = 2; // Insertion, soft clip, hard clip
    DELETION = 3;  // Deletion, N, P
}

message DataPoint {
    message BPConfidenceInterval {
        uint64 lower = 1;
        uint64 upper = 2;
        BasePair pair = 3;
    }
    repeated float signal = 1;
    repeated BasePair basecalled = 2; // What we basecalled
    repeated BPConfidenceInterval labels = 3; // labels describe corrected basecalled signal for training
}