Mycobacterium tuberculosis transmission cluster inference with Nanopore sequencing

Dataset

  • Mtb WGS from culture
  • Madagascar (118), South Africa (83), Birmingham (46)
  • Matched Illumina and Nanopore (and 35 PacBio CCS)
  • 150 after QC (7 PacBio)

Calibration of Nanopore variant filters

Aim: produce precision on par with Illumina

Calibration of Nanopore variant filters

Illumina SNPs called with COMPASS

Nanopore SNPs called with bcftools

Evaluate precision and recall with varifier for 7 PacBio samples

Different bcftools filters applied

Precision = fraction of calls that are correct

Recall = fraction of expected calls made correctly

Calibration of Nanopore variant filters

FP counts: 0, 0, 0, 1(0), 1(0), 1(0), 4(3)

Pairwise SNP distance

  • Generate consensus sequences
  • Ignore filtered, masked, missing, and null calls
  • Pairwise distance matrix (one for each modality)

Aim: do we see a linear relationship?

Pairwise SNP distance

SNP threshold clustering

Aim: produce Nanopore clusters consistent with Illumina

  • Thresholds: 0, 2, 5, and 12 (11 ONT)
  • Connect two samples if SNP distance ≤ threshold
  • Assess clustering based on 3 metrics

SNP threshold clustering

C_I =

A sample's Illumina cluster

C_N =

A sample's Nanopore cluster

Recall (SACR):

\frac{|C_I \cap C_N|}{|C_I|}

Precision (SACP):

\frac{|C_I \cap C_N|}{|C_N|}

SNP threshold clustering

Excess Clustering Rate (XCR):

\frac{|S_I - S_N|}{|S_I|}
S_I =

Set of Illumina singletons

S_N =

Set of Nanopore singletons

SNP threshold clustering

Nanopore doesn't miss any clustered samples

SACR: 1.0

SACP: 0.845

XCR: 0.031 (3/97)

SNP threshold clustering

SACR: 1.0

SACP: 1.0

XCR: 0.015 (2/137)
SACR: 1.0

SACP: 0.966

XCR: 0.008 (1/128)
SACR: 1.0

SACP: 0.949

XCR: 0.057 (7/122)
SACR: 1.0

SACP: 0.845

XCR: 0.031 (3/97)

Nanopore doesn't miss any clustered samples

Mixing technologies

Aim: can you cluster Illumina and Nanopore data?

Mixing technologies

Self-distance

Mixing technologies

Pairwise SNP distance

Mixing technologies

Simulate various mixture ratios

Nanopore-to-Illumina ratios: 0.01, 0.05, 0.1, 0.25, 0.5, 0.75, 0.9

Mixed SNP thresholds same as Illumina

For each ratio/threshold combo, do the following 1000 times:

  1. Randomly assign samples to a tech. in given ratio
  2. Calculate SACR, SACP, and XCR for given threshold

 

Simulate various mixture ratios

Simulate various mixture ratios

  • SACR (median) 1.0 for all simulations
  • Increasing Nanopore ratio moves median towards Nanopore-only SACP and XCR
  • In the "worst" case, mixed tech. produces the same results as Nanopore-only

Summary

  • Nanopore SNP precision on par with Illumina
  • Nanopore SNP recall ~8% lower than Illumina
  • No clustered samples are missed by Nanopore
  • Mixing technologies is possible

Mtb transmission cluster inference with Nanopore sequencing

By Michael Hall

Mtb transmission cluster inference with Nanopore sequencing

Rolling results for our work checking concordance between Nanopore and Illumina for public health applications

  • 266