Comparing hundreds of RNA-seq libraries using Galaxy and SeqResults

June 30, 2017

Brad Langhorst - New England Biolabs

Talk Goals

Answer to question:

How does NEB approach RNA-seq library analysis?

 

Ideal outcome:

New ideas for better analysis

What is NEB

  • Early Biotech Company - 1974
  • "by scientists for scientists"
  • Restriction enzymes, polymerases, ligases, ...
  • DNA and RNA sequencing library prep
  • Open by default
    • > 1000 peer reviewed publications
    • Conference presentations
    • Open source contributions

Guiding Principles

Data

Driven
Decisions

Intuitive

Tools

Foster
Exploration
 

Multiple
Layers of

Summary

Include
Evidence
 

SeqResults

Analysis

Database

Visualization

Acquisition

SeqShepherd

Timur, Mike Zulch

NGS Aggregate

Tableau

Library prep

RNA Library Prep

Enzyme Steps

Galaxy Analysis

Data Aggregation

Data Storage

Kevin Sun

Data Visualization

SeqResults Usage

Aggregation Details

Galaxy is Great

  • But limited for comparing results
  • Can do it tactically (library by library)
  • We pull results of analysis out to Postgres
    • e.g. transcript levels
    • per-transcript coverage
    • Picard RNA-seq metrics summary
  • Then analyze with Tableau

Lots of Tools Supported

Code Architecture

 

  • Simple class per tool output
  • ActiveRecord for DB abstraction

Similar Efforts?

  • MultiQC
    • no DB, simpler, lots of tools parsed
    • can't combine arbitrarily
  • Internal galaxy visualizations?
  • How to combine parsing effort?

Visualization

Percentage Stats

ERCC Levels (log)

ERCC Levels

Correlation (Ensembl)

Correlation (1-10)

5' - 3' Coverage

5' - 3' Coverage Details

5' - 3' Coverage Single Transcript

Timur Shtatland

Loss of Coverage Due to High GC

Production Use

Development is only part of where NEB uses Galaxy...

Lot to lot consistency
 

  • old lot + new lot on same flowcell
  • high and low input limits
  • relevant properties 
    • lib yield
    • 5'-3' coverage
    • correlation of transcript levels
    • proportion of reads on features
    • unaligned reads (minimal "bonus" RNA)

Visualization: Next Steps

  • Can we avoid Tableau?
    • Need:
      • Tools to easily select datasets to compare
      • Quickly represent results graphically
      • Allow users to interact with individual points
  •  Does Galaxy want to do these things?

Recap

Goals:

Foster quantitative analysis by experiment designers to help make NEB products excellent

 

 

Identify opportunities for collaboration

 

 

New ideas for analysis?

Thanks!

NEB

Galaxy Team 

IUC and other tool builders