Genomic Letter Soup

Benchmarking of de novo (meta)genomic assembly software

Computational Biology and Bioinformatics Day

October 21, 2020

@ines_cim

cimendes

Inês Mendes

M Ramirez Lab

Metagenomics

Random "shotgun" sequencing of microbial DNA, without selecting a particular gene.

Promising methodology for obtaining fast results for the identification of pathogens and their virulence and antimicrobial resistance properties without the need for culture.

 | The motivation

The assembly methods provide longer sequences that are more informative than shorter sequencing data and can provide a more complete picture of the microbial community in a given sample.

 | (Meta)Genomic assembly

Metagenomics

Metagenomics

 | (Meta)Genomic assembly

Benchmark

 | Ensuring reproducibility

Benchmark

 | Assembly workflow

Reference Dataset (Complete Bacterial Genomes)

In silico mock sample (even)

In silico mock sample (log)

Zymos standard (even)

Zymos standard (log)

3.7 M read pairs

8.8 M read pairs

47.8 M read pairs

Assembly Workflow

Assembly Quality Assessment

Benchmark

 | Assembly evaluation

Reference Dataset (Triple)

Assembly file (fasta)

Filter min contig size (1000 bp)

Mapping with Minimpa2

Read Data

PAF file (tab)

Benchmark

 | Assembly evaluation

C90 & C95

Number of contigs to cover at least 90% and 95% of the reference genome, respectively. 

Contig Phread Quality Score

E = 1 - Identity
Phred(E) = \begin{cases} -log(E) * 10 & \quad \text{if } E \text{< 0}\\ 60 & \quad \text{if } E \text{= 0} \end{cases}

Contiguity

Longest percentage of the reference sequence assembled in a single contig.

Benchmark

 | Mock sample (even)

Benchmark

 | Mock sample (even)

Benchmark

 | Mock sample (even)

Benchmark

 | Mock sample (even)

Contig Phred Quality Score  for GATBMiniaPipeline's Pseudomonas aerugiona assemby

Contig Size

Phred Score

Phred(E) = \begin{cases} -log(E) * 10 & \quad \text{if } E \text{< 0}\\ 60 & \quad \text{if } E \text{= 0} \end{cases}

Benchmark

 | Mock sample (even)

Contig Phred Quality Score per Reference for each Assembler

Special thanks to Pedro Vila-Cerqueira, Rafael Maria Mamede and Mário Ramirez.

Thank you for your attention

FCT PhD Grant SFRH/BD/129483/2017

CBBD'20 - Benchmark

By Inês Mendes

CBBD'20 - Benchmark

Slide deck for CBBD's 3 minute presentation

  • 405