Bacterial genome assembly

Fall 2014 BBB talk

Raj Ayyampalayam

Institute of Bioinformatics

Quantitative Biology Consulting Group

Welcome to BBB

  • http://qbcg.uga.edu/bbb-tutorials/
    • BBB info and archives 
  • Need help with running jobs on zcluster?
    • Contact us (qbcg@uga.edu).
    • GACRC Forums: https://forums.gacrc.uga.edu/
    • How to run jobs on zcluster?
      • https://wiki.gacrc.uga.edu/wiki/Running_Jobs_on_zcluster
    • UGA Galaxy: http://qbcg.uga.edu/galaxy-uga/
  • Subscribe to qbcg-announce mailing list
    • URL: http://goo.gl/k8WG13 enter your name and e-mail address

Bacterial genomes

  • Less complicated than larger genomes
  • Has the same issues
  • Garbage in garbage out
    • Better DNA better sequences
    • PCR free library perp
    • Multiple DNA extraction (different techniques)
  • Size ranges from 2MB to 5MB

Sequencing technologies

  • Illumina
    • MiSeq 
      • PE 150 - 4.5 GB to 5 GB
        • ~ 20 Genomes in a run
      • PE 300 - 13 GB to 15 GB
        • ~ 40 - 50 Genomes in a run
    • NextSeq
      • PE 150
        • Med output kit
          • 30 GB per run 
            • 50 - 60 Genomes
        • High output kit
          • 100 GB per run
            • 150 Genomes
        • Spike-ins?

Sequencing technology cont...

  • Long reads
    • PacBio
      • ~ 300 MB to 500 MB of data per SMRTCell
      • Reads of length up 30 KB
  • Sequencing strategy
    • All Illumina
      • Single library
      • Multiple library
    • Illumina and PacBio
      • Single Illumina fragment library (50x)
      • PacBio library (20x)

Assembly programs

  • A5 pipeline (http://sourceforge.net/projects/ngopt/)
    • ​Short reads only assembler
  • SPAdes (http://bioinf.spbau.ru/spades)
    • ​Can use PacBio sub-reads for scaffolding and repeat resolution
  • MaSuRCA (http://www.genome.umd.edu/masurca.html)
    • ​Hybrid assembler
    • Can user long reads for assembly
  • PBcR (http://wgs-assembler.sourceforge.net/wiki/index.php/PacBioToCA)
    • ​Pacbio specific de novo assembler
    • With enough coverage can self correct reads
    • Can use Illumina reads to correct PacBio reads and then assemble the corrected reads
  • PBJelly (http://sourceforge.net/p/pb-jelly/wiki/Home/)

    • Scaffolds draft assembly using long reads

Assembly metrics

  • QUAST: Quality Assessment Tool for Genome Assemblies
    • http://bioinf.spbau.ru/quast
    • Allows for comparison of multiple assemblies at the same time
    • Graphical outputs 
    • Generates “predicted genes” metrics and coordinates 
  • PHAST: PHAge Seach Tool
    • http://phast.wishartlab.com/
  • Summarize Assembly
    • Part of PBJelly suite

Genome Assemblers Comparision

  • GAGE-B http://bioinformatics.oxfordjournals.org/content/29/14/1718.full
    • ​MaSuRCA Best?
  • SPAdes analysis using the new version
    • http://bioinf.spbau.ru/en/content/spades-30-gage-b-data-sets
  • My experience
    • Depends on the nature of the genome
    • Try different assemblies
    • Tune and tweak parameters
    • Make your own meta assembly
      • http://garm-meta-assem.sourceforge.net/

Questions?Comments

Bacterial Genome Assembly

By Saravanaraj Ayyampalayam

Bacterial Genome Assembly

BBB Presentation - Bacterial genome assembly

  • 1,641