Unique Molecular Indices Improve

RNA-seq Experiments

Sequencing Finishing and the Future 2020-12-03

Brad Langhorst - New England Biolabs

Talk Goals

Illustrate utility of UMIs in RNA-seq
Examine technical factors associated with apparent transcript level differences
Compare Transcript levels +/- UMI with expectation
Explore correlation of transcript features with measured abundance

What is a UMI

Universal Molecular Identifier (a.k.a. MID, Molecular barcode, UID)
Semi-random synthetic sequence of DNA bases
Probability of any specific sequence = 1/4^n
n = number of bases
n=1 1/4 A,C,G, or T
n=2 1/16 AA,AC,AG,AT,CA,CC ... GT,TA,TC,TG,TT
n=3 1/256 ...
- We use 11 bp UMIs = 1/4,194,304
When combined with transcript identity, probability of collision is very low (function of transcript abundance)
Sequencing errors might produce related UMIs

RNA-Seq Workflow Overview

UMI added here

= technical factors contributing to observed abundance

RNA-Seq Workflow Overview

UMIs duplicated here

Example Molecule

Technical Factors

Pre - UMI
- Priming bias (hexamers may not be random)
- Fragmentation bias (too easy = low, too hard = low)
- A-tailing bias (lower efficiency = low)
Post - UMI
- Size selection/cleanups (bead binding, differential elution)
- PCR (too easy = high, too hard = low)

Uneven Amplification

BRCA1

p53

Original

Library

+ UMI

Fragments

PCR

25/18 = 1.8

9/5 = 1.4

BRCA1/p53 ratio

4/3 = 1.3

Identification of Duplication

Dedup

Using UMI

No UMI

BRCA1

Alignment

= 12

= 3

= 4

= 12

BRCA1

+/-UMI Experiment Design

C. Devoe, D. Posfai, K. Krishnan, D. Rodriguez

Goal: Compare counts of transcripts with and without UMI

10 ng total RNA (~ 300 cells, human/mouse blood)
NEBNext Ultra II RNA library prep + UMI adaptors
12 PCR cycles
Sequenced on Illumina NovaSeq 6000 S2, 2x75bp

Experimental Results

Experimental Results

Hard to Amplify

Easy to Amplify

RNA Mixture Experiment

G. Naishadham

Goal: Compare counts of transcripts to expectation

10 ng Input (~ 300 cells, cell line Blood/Brain RNA)
Defined mixtures 1:3 and 3:1 to generate expected values
NEBNext Ultra II RNA library prep + UMI adaptors
12 PCR cycles
Sequenced on Illumina NovaSeq 6000 S2, 2x75bp

RNA Mixture Experiment

G. Naishadham

Expected Results

G. Naishadham

Experimental Results

G. Naishadham

Do transcript features explain variable amplification?

GC
Fragment Length
RNA structure stability
K-mer enrichment
Others?

Transcript GC%

Increasing Duplication

transcripts with > 100 reads

Fragment Lengths

Increasing Duplication

G. Naishadham

transcripts with > 100 reads

Mean Free Energy

Increasing Duplication

Increasing Predicted Stability

transcripts with > 100 reads

G. Naishadham

Other Factors Under Consideration

Complexity (Shannon information content)
Maximum 200 bp folding energy
Other ideas ?

UMIs in RNA-Seq

We see differences in amplification between transcripts
We can partially correct for differences using UMIs
Implications for power assessment in differential expression
We don't yet understand mechanism but we're not done digging yet

PS: I'm hiring soon - see the job board!

SFAF2020 Unique Molecular Indices Improve RNA-seq Experiments

By Brad Langhorst

SFAF2020 Unique Molecular Indices Improve RNA-seq Experiments

Brad Langhorst

New England Biolabs

Unique Molecular Indices Improve

RNA-seq Experiments

Talk Goals

What is a UMI

RNA-Seq Workflow Overview

RNA-Seq Workflow Overview

RNA-Seq Workflow Overview

Example Molecule

Technical Factors

Uneven Amplification

Identification of Duplication

+/-UMI Experiment Design

Experimental Results

Experimental Results

Experimental Results

RNA Mixture Experiment

RNA Mixture Experiment

Expected Results

Experimental Results

Do transcript features explain variable amplification?

Transcript GC%

Fragment Lengths

Mean Free Energy

Other Factors Under Consideration

UMIs in RNA-Seq

SFAF2020 Unique Molecular Indices Improve RNA-seq Experiments

SFAF2020 Unique Molecular Indices Improve RNA-seq Experiments

Brad Langhorst

More from Brad Langhorst