Alexander Peltzer
Quantitative Biology Center (QBiC) Tübingen
http://bit.ly/nfcoreisc2018
Outlook
Challenges: Big Data
We need methods and tools to analyze such data!
Challenges: Big Data - ICGC
Text
"Hyper-Moore gap"
Text
Credit to Swaine Chen, Genome Institute of Singapore, AWS Summit 2018
The FAIR* principle
Findable
Accessible
Interoperable
Reproducible
The FAIR Guiding Principles for scientific data management and stewardship, Wilkinson et al. 2016 qPortal: A platform for data-driven biomedical research, Mohr et al. 2018
DOI
qPortal
?
?
Challenges: Software dependencies
Workflows / Pipelines consist of
Complex dependency trees and configuration requirements!
Steinbiss et al., "Companion: a web server for annotation and analysis of parasite genomes", NAR 2016
Challenges: Reproducibility
Challenges: Reproducibility
"We estimated the overall time to reproduce the method as 280 hours for a novice with minimal expertise in bioinformatics."
Challenges: Environmental stability
Challenges: Software dependencies
diTommaso et al., 2017, Nature Biotechnology
Nextflow
(credit to E Floden, CRG Barcelona)
Nextflow: Centralised Orchestration
Nextflow
Cluster
Storage
Nextflow: Cloud deployment (AWS)
(credit to E Floden, CRG Barcelona)
Platform support
(credit to E Floden, CRG Barcelona)
Nextflow: Executor abstraction
Improves code portability
#Run me locally
process.executor = 'local'
#Run on AWS Batch
process.executor = 'awsbatch'
#Run on Kubernetes cluster
process.executor = 'k8s'
Phil Ewels
Alex Peltzer
Sven Fillinger
Andreas Wilm
Maxime Garcia
+ many others!
Tiffany Delhomme
All pipelines adhere to requirements
Dockerfiles
FROM nfcore/base
MAINTAINER Phil Ewels <phil.ewels@scilifelab.se>
LABEL authors="phil.ewels@scilifelab.se" \
description="Docker image containing all requirements for the nfcore/rnaseq pipeline"
COPY environment.yml /
RUN conda env create -f /environment.yml && conda clean -a
ENV PATH /opt/conda/envs/nfcore-rnaseq-1.5dev/bin:$PATH
Dockerfiles
name: nfcore-rnaseq-1.5dev
channels:
- bioconda
- conda-forge
- defaults
dependencies:
- conda-forge::openjdk=8.0.144
- fastqc=0.11.7
- trim-galore=0.4.5
- star=2.6.0c
- hisat2=2.1.0
- picard=2.18.7
- bioconductor-dupradar=1.8.0
- conda-forge::r-data.table=1.11.4
- conda-forge::r-gplots=3.0.1
- bioconductor-edger=3.20.7
- conda-forge::r-markdown=0.8
- preseq=2.0.3
- rseqc=2.6.4
- samtools=1.8
- stringtie=1.3.4
- subread=1.6.1
- multiqc=1.5
Optional requirements
Need help?
It's demo time!
Comes with interactive reports!
Comes with proper documentation!
... and a lot more!
Phil Ewels (SciLifeLab)
Maxime Garcia (SciLifeLab)
Sven Fillinger (QBiC)
Paolo di Tommaso (CRG)
Evan Floden (CRG)
Andreas Wilm (A* Singapore)