New Datasets for Marine Macroecology:

Metabarcoding, Metagenomics, & Related Techniques

2023-06-21, CBIOMES Annual Meeting

Jesse McNichol

  1. Why do we have new datasets?
  2. Method definitions, pros and cons
  3. Global metagenomes
  4. Global metabarcodes
  5. Data & directions

Outline

(1) Cheap DNA sequencing = new datasets

Early sequencing

"Next Generation Sequencing" (NGS)

"3rd generation" sequencing

(1) Different stages

Exploration, discovery

Intercalibration, harmonization

Methodological validation

Ecosystem

Ecosystem

Metagenomics

Metabarcoding

Metabarcoding

(2) Metabarcodes vs metagenomes

Different primers, different regions of barcode

Different primers, different organismal range

Universal

Universal Prok

Universal Euk

Universal Bacteria

(2) Challenges of metabarcoding

(3) Global shotgun metagenomics = the answer?

BioGEOTRACES

Bio-GO-SHIP

TARA Oceans Expedition

  • Huge data resource, but costly ($ and compute)
  • Limited depth (mostly sequence abundant things)
  • Taxonomic resolution depends on mapping to a database
  • Size fractionation a complication for TARA (others are > 0.2 µm)

(3) Global shotgun metagenomics = the downsides

(4) Global shotgun metabarcoding

  • Broad organismal range (Archaea - Zooplankton)
  • Unfractionated samples (> 0.2 µm)

Universal

Different same primers, different same regions of barcode

  1. GRUMP Rank Abundance Distributions
  2. Other interesting metrics
  3. Going beyond relative abundance
  4. How much trait information do we need?
  5. Do we need to integrate short & long-read technologies?

(5) Data & Directions

(5) GRUMP RADs (P16N/S)

Are organism ranks stable?

 How do RADs differ:

  • Across depth?
  • Ecological province?
  • Trophic level?

(5) Other interesting metrics

Microheterotroph: Phytoplankton ratios

Is this true in the Southern Ocean or other, unusual enviroments? What about metazoans or other taxa?

(5) Beyond relative abundance

Compositional data is not ideal. What to do?

Lexi

Enrico & Mick

Use paired data such as FACS as "anchor"

Analyze with internal standards (spike-in)

Pančić and Kiørboe, 2018

ASV DNA data

ASV phylogeny

(5) Linking ASVs to traits

How much trait information is needed to interpret macroecological patterns?

  • How to robustly intercompare data from different rRNA regions?

Advantages of full-length rRNA database:

  • Allows intercomparisons with legacy datasets
  • Potentially improves taxonomic resolution of ASV data

Dueholm et al. (2020) mBio, e01557-20

(5) Integrating short-, long-read technologies

Environmental DNA/RNA

Long-read sequencing (e.g. PacBio CCS)

Database of full-length 16S rRNA

The End

Cost of identifying organisms

Methods summary: Pros and cons

Other metrics

  • Species area relationship (SAR)
  • Distance decay (Florida straits)
  • Taylor's power law