Improving Transcriptome Analysis with Molecular Indices and Galaxy

2019-07-03

Brad Langhorst - New England Biolabs

Talk Goals

Convey utility of UMIs in RNA-seq
Examine technical factors associated with apparent transcript level differences
Compare Transcript levels +/- UMI with expectation
Explore correlation of transcript features with measured abundance
For me: Hear about what other features should we consider?

What is a UMI

Universal Molecular Identifier
(aka MID, Molecular barcode, UID)
Semi-random synthetic sequence of DNA bases
Probability of any specific sequence = 1/4^n
n = number of bases
n=1 1/4 A,C,G, or T
n=2 1/16 AA,AC,AG,AT,CA,CC ... GT,TA,TC,TG,TT
n=3 1/256 ...
We use 12 bp UMIs = 1/16,777,216
When combined with transcript identity, probability of false collision is very low (function of transcript abundance)
Sequencing errors might produce related UMIs

RNA-Seq Workflow Overview

UMI added here

technical factors related to observed abundance

RNA-Seq Workflow Overview

UMIs duplicated here

Example Molecule

Technical Factors

Pre - UMI
- Priming bias (hexamers may not be random)
- Fragmentation bias (too easy = low, too hard = low)
- A-tailing bias (lower efficiency = low)
Post - UMI
- Size selection/cleanups (bead binding, differential elution)
- PCR (too easy = high, too hard = low)

Uneven Amplification

BRCA1

p53

Original

Library

+ UMI

Fragments

PCR

25/18 = 1.8

9/5 = 1.4

BRCA1/p53 ratio

4/3 = 1.3

Identification of Duplication

Dedup

Using UMI

No UMI

BRCA1

Alignment

= 12

= 3

= 4

= 12

BRCA1

+/-UMI Experiment Design

C. Devoe, K. Krishnan, D. Rodriguez

Goal: Compare counts of transcripts with and without UMI

10 ng total RNA (~ 300 cells, human/mouse blood)
NEBNext Ultra II RNA library prep + UMI adapters
12 PCR cycles
Sequenced on Illumina NovaSeq 4000 S2, 2x75bp

Experimental Results

C. Devoe, K. Krishnan, D. Rodriguez

Experimental Results

C. Devoe, K. Krishnan, D. Rodriguez

Experimental Results

Hard to Amplify

Easy to Amplify

C. Devoe, K. Krishnan, D. Rodriguez

RNA Mixture Experiment

G. Naishadham

Goal: Compare counts of transcripts to expectation

10 ng Input (~ 300 cells, cell line Blood/Brain RNA)
Defined mixtures 1:3 and 3:1 to generate expected values
NEBNext Ultra II RNA library prep + UMI adapters
12 PCR cycles
Sequenced on Illumina NovaSeq 4000 S2, 2x75bp

RNA Mixture Experiment

G. Naishadham

Experimental Results

G. Naishadham

Experimental Results

G. Naishadham

Need MOAR Reads!

Do transcript features explain variable amplification?

GC
Fragment Length
RNA structure stability
K-mer enrichment
Others?

Transcript GC%

Increasing Duplication

transcripts with > 100 reads

Fragment Lengths

Increasing Duplication

G. Naishadham

transcripts with > 100 reads

Mean Free Energy

Increasing Duplication

Increasing Predicted Stability

transcripts with > 100 reads

G. Naishadham

Other Factors Under Consideration

Complexity (Shannon information content)
Maximum 200 bp folding energy
Other ideas ?

UMIs in RNA-Seq

We see differences in amplification between transcripts
We can partially correct these using UMIs
We don't yet understand mechanism but we're not done digging yet

PS: I'm hiring soon - see the job board!

Copy of Improving Transcriptome Analysis with Molecular Indices and Galaxy

By Brad Langhorst

Copy of Improving Transcriptome Analysis with Molecular Indices and Galaxy

Brad Langhorst

New England Biolabs

Improving Transcriptome Analysis with Molecular Indices and Galaxy

Talk Goals

What is a UMI

RNA-Seq Workflow Overview

RNA-Seq Workflow Overview

RNA-Seq Workflow Overview

Example Molecule

Technical Factors

Uneven Amplification

Identification of Duplication

+/-UMI Experiment Design

Experimental Results

Experimental Results

Experimental Results

RNA Mixture Experiment

RNA Mixture Experiment

Experimental Results

Experimental Results

Do transcript features explain variable amplification?

Transcript GC%

Fragment Lengths

Mean Free Energy

Other Factors Under Consideration

UMIs in RNA-Seq

Copy of Improving Transcriptome Analysis with Molecular Indices and Galaxy

Copy of Improving Transcriptome Analysis with Molecular Indices and Galaxy

Brad Langhorst

More from Brad Langhorst