Community built bioinformatics pipelines

Alexander Peltzer



Community built bioinformatics pipelines

Ep. 1: Monday, July 22 by Harshil Patel

Ep. 2: Thursday, July 25 (you're there!)

Ep. 3: Friday, July 26: AEBC2 Workshop

Season 02

Challenges: Big Data


  • Data in computational (biology, physics, chemistry ...) is
    • big (PB scale)
    • diverse (e.g. sequencing, proteomics, ...)
    • erroneous (e.g. contains sequencing errors)

We need methods and tools to analyze such data!

Challenges: Software dependencies



Workflows / Pipelines consist of

  • different tools
  • dozens of individual methods


Complex dependency trees and configuration requirements!


Steinbiss et al., "Companion: a web server for annotation and analysis of parasite genomes", NAR 2016

Challenges: Software dependencies



"[...] of the tools selected for our comprehensive and systematic usability test, 51% were deemed "difficult to install," and 28% of the tools failed to be installed [...]."

- Mangul et al, PLOS Biology, June 20 2019

Challenges: Reproducibility




Many paper results are hard to reproduce!





  • Custom DSL (domain-specific language) for
    • fast prototyping
    • enabling task composition
    • easy parallelization
  • Self-contained: Containerize tasks (e.g. with Docker)
  • Isolation of dependencies: Keep container - rerun analysis at any point!

Nextflow: Executor abstraction


=> Improves code portability

#Run script locally
process.executor = 'local'

#Run script on PBS/Torque
process.executor = 'pbs'

#Run script on Kubernetes cluster
process.executor = 'k8s'

#Run script on AWS Batch
process.executor = 'awsbatch'

#Run script on Google Pipelines
process.executor = 'google-pipelines' 
  • Community effort to collect production ready analysis pipelines
  • Save time in development, more testing, more updates


Phil Ewels

Alex Peltzer

Sven Fillinger

Maxime Garcia

+ many others!

Harshil Patel

Andreas Wilm

20+ institutions, others joining!

All pipelines adhere to requirements

  • Nextflow based
  • MIT license
  • Software bundled in Docker / Singularity
  • Continuous integration testing (e.g. Travis CI)
  • Stable release tags
  • Common pipeline usage and structure
  • Software bundled in bioconda
 # Lint the pipeline code
 - nf-core lint ${TRAVIS_BUILD_DIR}
 # Lint the documentation
 - markdownlint ${TRAVIS_BUILD_DIR} -c ${TRAVIS_BUILD_DIR}/.github/markdownlint.yml
 # Run, build reference genome with STAR
 - nextflow run ${TRAVIS_BUILD_DIR} -profile test,docker
 # Run, build reference genome with HISAT2
 - nextflow run ${TRAVIS_BUILD_DIR} -profile test,docker --aligner hisat2
  • 15 stable
  • 18 in development
//Profile config names for nf-core/configs
params {
  config_profile_description = 'BINAC cluster profile provided by nf-core/configs.'
  config_profile_contact = 'Alexander Peltzer (@apeltzer)'
  config_profile_url = ''

singularity {
  enabled = true

process {
  beforeScript = 'module load devel/singularity/3.0.3'
  executor = 'pbs'
  queue = 'short'

params {
  igenomes_base = '/nfsmounts/igenomes'
  max_memory = 128.GB
  max_cpus = 28
  max_time = 48.h


Comes with interactive reports!

Comes with proper documentation!

... and a lot more!

Whats next with nf-core?

  • Biocontainers integration
  • Automated Cloud Tests (Price estimates?)
  • Automated full-size testing
  • nf-core/modules (Nextflow DSLv2)


Phil Ewels (SciLifeLab, Stockholm)

Maxime Garcia (SciLifeLab, Stockholm)

Harshil Patel (The Francis Crick Institute, London)

Sven Fillinger (QBiC/Tü)

Paolo di Tommaso (CRG, Barcelona)

Evan Floden (CRG, Barcelona)


and all contributors!

NF-Core Team


(Paper in Revision)


By Alexander Peltzer


nf-core presentation for BOSC/ISMB 2019:

More from Alexander Peltzer