NF-Core: Community-based best practice pipeline development in Nextflow

 

Alexander Peltzer

Quantitative Biology Center (QBiC) Tübingen

http://bit.ly/ismb2018-nfcore

@alex_peltzer

Outlook

 

  • Challenges in computational biology
  • Basic introduction to Nextflow
  • Introduction to NF-core project

Challenges: Big Data

 

  • Data in computational biology is
    • big (PB scale)
    • diverse (sequencing, proteomics, metabolomics ...)
    • erroneous (e.g. contains sequencing errors)

 

 

We need methods and tools to analyze such data!

Challenges: Software dependencies

 

 

Workflows / Pipelines consist of

 

  • various different tools
  • typically dozens of individual methods

 

Complex dependency management!

 

Challenges: Reproducibility

 

  • Large-scale projects more common today
    • 1,000 Genomes Project
    • 100,000 Genomes Project UK
  • Reproduce results with older data / integrate with newer data

 

 

 Many paper results are not reproducible!

 

Nextflow

 

  • Custom DSL (domain-specific language) for
    • fast prototyping
    • enabling task composition
    • easy parallelization
  • Self-contained: Containerize tasks (e.g. with Docker)
  • Isolation of dependencies: Keep container - rerun analysis at any point!
  • Community effort to collect production ready analysis pipelines
  • Save time in development, more testing, more updates

 

https://nf-co.re

 

Phil Ewels

Alex Peltzer

Sven Fillinger

Andreas Wilm

Maxime Garcia

+ many others!

Tiffany Delhomme

All pipelines adhere to requirements

  • Nextflow based
  • MIT license
  • Software bundled in Docker / Singularity
  • Continuous integration testing (e.g. Travis CI)
  • Stable release tags
  • Common pipeline usage and structure

Optional requirements

 

  • Software bundled in Bioconda
  • Optimized output formats (e.g. CRAM)
  • Explicit support for cloud environments (AWS)
  • Benchmarks for running on such environments

Need help?

 

  • Cookiecutter: To get a skeleton for new pipelines
  • Linting app: To check what conforms with nf-co.re
  • Gitter: To communicate with the community!

 

Comes with interactive reports!

Comes with proper documentation!

... and a lot more!

Acknowledgements

Phil Ewels (SciLifeLab, Stockholm)

Maxime Garcia (SciLifeLab, Stockholm)

Sven Fillinger (QBiC, Tübingen)

Paolo di Tommaso (CRG, Barcelona)

Evan Floden (CRG, Barcelona)

Andreas Wilm (A* Singapore, Singapore)

Tiffany Delhomme (IARC, Paris)

Made with Slides.com