Using Global Metagenomes to Quantify PCR Primer Coverage and Integrating 3-Domain Amplicon Data with Models

Jesse McNichol1, Paul M. Berube2, Steven J. Biller2,3,

Sallie W. Chisholm2, Jed A. Fuhrman1

 

1University of Southern California

2Massachusetts Institute of Technology

3Wellesley College

 

  • Universal Ribosomal PCR Primers target the SSU rRNA of (almost) everything:
    • Quantify the whole cellular community from A-Z (Archaea + Bacteria  + Organelles + Protists + Zooplankton)
    • 16S and 18S on same scale - ratios meaningful
    • Low cost (~$8/sample, 200,000 sequences/ea)
    • Amplicon sequence variant (ASV) methods allow intercomparison and mapping

Universal primers as a mapping tool

https://www.egeotraces.org/sections/GP13_Fe_D_CONC.html

 

  • Detailed protocols available on https://fuhrman-lab.github.io
    • Lab protocol
    • in-silico workflow:
      • Splits 16S and 18S automatically (analyzed separately)
      • Denoises with DADA2/deblur in qiime2 => ASVs
      • Splits data automatically into categories (e.g. chloroplasts)
      • Step-by-step guide on protocols.io

Universal primers are easy to analyze!

Universal primers complement existing techniques

Universal primers complement existing techniques

 

  • Simons Collaborative Marine Atlas Project (cmap.readthedocs.io)
  • Unify data retrieval and co-localization
  • ASVs are stable, and make sense in database

Universal primers as a mapping tool - CMAP!

 

 

  1. Metagenomic validation of universal primer quantification and environmental coverage
  2. Use cases for universal primer data (with BioGEOTRACES data)
  3. Future work

Talk Outline

https://www.egeotraces.org/sections/GP13_Fe_D_CONC.html

Mock communities show PCR can be quantitative

True also for 18S mocks

biorxiv.org/content/

10.1101/866731v1

 

Similarly quantitative "in the wild"

Metagenome

abundances

(16S)

PCR amplicon

abundances (16S)

Similarly quantitative "in the wild"; 18S less so

Metagenome

abundances

(18S)

PCR amplicon

abundances (18S)

Similarly quantitative "in the wild"; 18S less so

Metagenome

abundances

(18S)

PCR amplicon

abundances (18S)

What about primer bias?

Inspired by this paper, we are using metagenomes to determine theoretical primer performance across ecosystems

MGPrimerEval : a reproducible snakemake pipeline

  • Compares primers to unassembled MG reads
  • Reproducible by conda/docker integration
  • tinyurl.com/titus-workflow-philosophy
  • github: tinyurl.com/MGPrimerEval

https://www.physalia-courses.org

Sorting, aligning, subsetting

Primer-MG comparisons, data summary

QC, SSU rRNA sifting, repeat removal

MGPrimerEval : datasets

  • TARA oceans (458), BioGEOTRACES (610), MBARI bloom MG (84), Malaspina FL/PA samples (58), SPOT/Catalina MG (8)
    • Total = 1,218 metagenomes
    • Globally distributed, mostly oxic, pelagic samples

Pesant et al., 2015

Sieradzki et al., 2019

Acinas et al., 2019

Nowinski et al., 2019

Biller et al., 2018

MGPrimerEval : predicted coverage (0-mismatch)

0.0            0.5         1.0 

            0.5          1.0 

            0.5          1.0 

            0.5          1.0 

MGPrimerEval : conclusions

  • 515Y/926R universal primers have best coverage overall
    • Nearly 100% for 16S, ~90% for 18S
      • 18S has similar coverage vs. Euk-specific primers
    • Will provide accurate community profiles across pelagic ocean regions
    • Some rare taxa have low coverage
  • Some other primers have unexpectedly low coverage
  • Improvements can be made to any primer to:
    • Maximize overall coverage across diverse environments
    • Optimize to a particular environment

Use cases: BioGEOTRACES amplicons

Will present new data from these transects

These are "whole water" samples (>0.2uM)

Universal primers + whole water samples =

comprehensive profile

NB: 18S usually small fraction of total

Will be a smaller % for amplicons

# of 18S sequences recovered will be limited for some regions

Most metagenomes have < 20% 18S

3. Use cases:

 

Resolution

  • Are short amplicons (~373 bp) enough to resolve ecologically-relevant subgroups of Prochlorococcus?

 

Sensitivity (detection limit)

 

 

 

Modelling

Majority of ASVs consistent with genome phylogeny and associated with a particular ecotype

*Berube et al. eLife 2019;8:e41043.

3. Resolution: Prochlorococcus ASVs

ASV hits to clade

3. Resolution: Prochlorococcus ASVs

HLII clade ASV

 

Higher abundance in warmer waters

HLI clade ASV

 

Abundance increases as temperature drops

 

 

Both distributions consistent with physiological data from pure cultures and field data

3. Resolution: Prochlorococcus ASVs

Biller et al., 2015 Nat. Rev. Microbio.

3. Use cases:

 

Resolution

  • Are short amplicons (~373 bp) enough to resolve ecologically-relevant subgroups of Prochlorococcus?

 

Sensitivity (detection limit)

  • Can we sequence deeply enough to find rare, but biologically-relevant organisms like diazotrophs?

 

Modelling

3. Sensitivity: Diazotrophs

UCYN-A (diazotroph)

Braarudosphaera chloroplast (putative host)

1/2000

to

1/13,000

Zeroes meaningful

No UCYN-A observed out of

~50,000 reads

3. Sensitivity: Diazotrophs

UCYN-A (diazotroph)

Braarudosphaera chloroplast (putative host)

qPCR (nifH) data

3. Sensitivity: Diazotrophs

UCYN-A - Braarudosphaera correlation

UCYN-A (diazotroph)

Braarudosphaera chloroplast (putative host)

3. Use cases:

 

Resolution

  • Are short amplicons (~373 bp) enough to resolve ecologically-relevant subgroups of Prochlorococcus?

 

Sensitivity (detection limit)

  • Can we sequence deeply enough to find rare, but biologically-relevant organisms like diazotrophs?

 

Modelling

  • How can data be used to improve modelling efforts?

3. Modelling: Rationale

  • Microorganisms process up to 50% of PP
  • Not yet included in trait-based models such as DARWIN
  • Amplicons can potentially constrain microorganism:
    • Biogeography
    • Depth distributions
  • Use to inform and test modelling approaches?

Depth profiles from KN204-stn3

3. Modelling: ASV (~species) depth profiles

Marine actinobacterium

Extreme oligotroph

Very small size

SAR86, uncultured

Lipid/protein degrader?

Deep/shallow ecotypes?

Marine group II archaea

Also lipid protein/degrader?

Particle attached?

3. Modelling: Aggregated profiles

3. Modelling: Expanding ocean coverage

3. Modelling: Expanding ocean coverage

3. Modelling: Defining traits

Reintjes et al., 2018, ISME doi.org/10.1038/s41396-018-0326-3

Flavobacteria

Alteromonas

SAR11

3. Modelling: Integrating different data types

Biogeography

(e.g. GA03)

Meta'omics (e.g. Iverson et al 2012, Pachiadaki et al., 2019)

Models (e.g. Emily Zakem's work on heterotrophs)

Conclusions: Universal Primers

  • Cost-effective tool to map whole cellular community from A-Z (Archaea + Bacteria + Organelles + Protists + Zooplankton)
    • Chloroplast data more quantitative vs. 18S
  • Can guide to where we want to concentrate more focused studies of same DNA (e.g. meta'omics)
  • Straightforward to process and interpret
  • Can provide guideposts for modelling efforts

https://www.egeotraces.org/sections/GP13_Fe_D_CONC.html

Acknowledgements

  • Yi-Chun Yeh (for collaboration on amplicon analysis)
  • Shengwei Hou (for help and inspiration on computational analyses)
  • Clark Richards (for help with R "oce")
  • Ken Youens-Clark (for help with C-microbe-MAP)
  • Emily Zakem (for helping me understand modelling)
  • Mike Lee (for help separating 16S and 18S in silico)
  • Fuhrman lab crew
    • fuhrman-lab.github.io
    • tinyurl.com/MGPrimerEval

% Eukaryotic amplicons

  • GP13 (mostly oligotrophic surface waters)
    • AVG = 3.2% (range: ~1-28%)
    • For 200,000 sequences = 6400 18S sequences (2000 - 56,000)
  • GA03 (mostly oligotrophic, more deep-sea sampling)
    • AVG = 2.7% (range: 0.17-17%)

DADA2 vs deblur (ASVs vs. MG)

DADA2 vs deblur (ASV vs. ASV)

Scripps e-seminar

By jcmcnch

Scripps e-seminar

Zoom seminar to SIO community June 17th 2020

  • 7