Metagenome Program
Adam R. Rivers
JGI Scientific Advisory Meeting
January 19, 2016
Outline
-
Program overview
-
Program publications
-
Program science
- Viral discovery and machine learning
- Historical reconstruction with metagenomics
-
Improvements to user science
- Automating Stable Isotope Probing (SIP) metagenomics
- Community profiling with iTags
- metagenome assembly and binning
- Global metagenome comparisons
Interconnected
Function driven
Genome centric
Metagenome program overview
Metagenome program overview
- Data Products
-
Community iTags 10,000
-
Metagenomes 825
-
Metatranscriptomes 900
User driven science
Program driven science
- Metagenome program science
- Viral discovery and function
- Historical reconstruction
- Machine learning for metagenomics
- Microbial systems group science
- Plant microbe interactions
- Wetland biogeochemistry
Metagenome program in context
Carbon Cycling
Biofuels
Biogeochemistry
Integrated, system based approaches to understanding:
- Development of cross-platform analyses
- Integration of data across projects
- Research into carbon cycling and plant-microbe interaction
Metagenome program projects
Metagenome program publications
Metagenome program publications
Discovery of canidate radiation phyla
- A large group of uncultivated phyla are sound in groundwater
- Small cells, minimal genomes
- The phyla have highly divergent 16S genes with introns
Metagenome program publications
Salicylic acid modulates root colonization
The discovery of a mechanism for allowing colonization by endophytic bacteria
Metagenome program publications
Methane production in restored wetlands
Does restoring wetlands help or hurt climate change?
Methanogen abundance and methane emissions from new wetlands are dependent on electron acceptors, salinity and age
Metagenome program science
Viruses as ecosystem drivers
Viral discovery in metagenomes and metatranscriptomes by machine learning
The first soil virus metagenome
The first "complete" virus metagenome (single and double stranded DNA and RNA viruses)
Diel infection of RNA viruses in lakes
Finding highly divergent RNA viruses
Higher information content
Less bias
- Viral classifier with 95% recall
- Increasing precision by incorporating homology information to identify known organisms
- Screening of all metatranscriptomic data from IMG and Tara Oceans
Machine learning applications
- Supervised
- Supervised
- Unsupervised
- Unsupervised
Classification in metagenomics
GeneLearn
A modular application for machine learning from sequence data
Historical reconstruction
metagenomics may have the ability to reconstruct past events leading to an understanding of for understanding climate, agricultural and human change
Historical reconstruction
Construction 1649
British seige 1803
Cholera 1853
Cholera sequence
Improvements to user science
Interconnected
Genome centric
Function driven
iTags overhaul
Genome binning
Improved assembly
SIP ETOP
Host DNA depletion
Expression analysis
MT insert size
Gaia assembler
Interconnected
Function driven
Genome centric
Automating stable isotope probing
Stable isotope probing (SIP) is a method to identify the genes of microbes using a specific compound
SIP has been too complicated and time consuming to be widely adopted. This ETOP simplifies SIP to make it more widely available to JGI users
Jennifer Pett-Ridge
unlabeled DNA
labeled DNA
SIP automation ETOP
SIP automation ETOP
Current SIP-’omics approach is low-throughput
and requires special equipment
LLNL approach will use NanoSIMS for sample
prescreening prior to SIP processing/sequencing
Will also improve density gradient separation with
intercalators
And will automate:
-
Fraction collection
-
Density profile Characterization
-
Fraction cleanup (desalting)
-
Nucleic acid quantification
-
Reverse transcription and amplification
Total hands-on processing
Standard SIP | ETOP SIP | |
---|---|---|
1 sample | 13 | 1 |
24 sample experiment |
312 | 24 |
Itags amplicon sequencing
25,000 Itags sequenced since 2013
Itags are a useful, cost effective way of profiling communities but they are not being fully used.
Phase 1: Sequence the V4 and V5 region
Phase 2: Integration of all itag data and enhanced analysis
- Open Reference OTU picking across all JGI Itag data
- Improved metadata search and visualization tools
- More analyses e.g. Bayesian ecological networks
Metagenome assembly and binning
- Complete overhaul of Metagenome and Metatranscriptome assembly reduced resource use and increased assembly quality
- Publication of Metabat for automatic genome binning
- Publication of Elviz for manual binning and exploration
All vs. all metagenomic assembly
- Metagenomic data are only compared against reference reads
- The cost of recomputing annotations when new references are added is high
The challenge:
The Gaia Assembler is a global, distributed, asynchronous assembly and alignment program designed to continuously align and annotate read data of arbitrarily large size.
Host Genome depletion
The challenge:
Endophytic bacteria cannot be effectively sequenced because host DNA overwhelms the sample
Working with a commercial partner to develop host specific depletion probes from genomic DNA
Charge Questions
Should the metagenome program prioritize functional methods like SIP and metatranscriptomics over genome reconstruction improvements?
Should we begin directing more resources towards developing methods to reconstruct metabolic networks?
Should more work be done to make metagenome data available in a machine readable way, even if that means that fewer interactive tools are available?
SAC presentation Jan. 20, 2016
By Adam Rivers
SAC presentation Jan. 20, 2016
- 229