Medical Microbiology Maastricht UMC
May 26th, 2020
@ines_cim
cimendes
Inês Mendes
@ines_cim
cimendes
Can person X, with the same data and the same methodology, obtain the same conclusions?
The needs:
Writing of pipelines in python/perl/shell scripts circa 2000, colorized.
Workflows in the Paleolithic era:
The game changing combination of workflow managers and containers:
Workflows in the Modern era:
It records changes to a file or set of files over time so that you can recall specific versions later.
It allows you to revert selected files back to a previous state, revert the entire project back to a previous state, compare changes over time, see who last modified something that might be causing a problem, who introduced an issue and when, and more.
Runs the same regardless of the environment.
Enables the distribution and deployment of scientific software in a runnable state.
A container image is a lightweight, stand-alone, executable package of a software that includes everything needed to run it:
Host Hardware
Host Hardware
Container Engine
Host OS
Host OS
Hypervisor
Guest OS
Guest OS
App
App
Guest OS
VM1
VM2
App
App
App
App
Virtual Machines
Containers
build
pull & run
host
push
Enables scalable and reproducible scientific workflows. It simplifies the deployment of complex parallel and reactive workflows.
Reactive workflow framework
Create pipelines with asynchronous (and implicitly parallelized) data streams
Programing DSL
Has its own language for building a pipeline
Containerized
Out of the box integration with containers engines (Docker, Singularity, Shifter)
The creation of workflow pipelines was designed for bioinformaticians familiar with programming.
It's execution is for everyone.
https://github.com/B-UMMI/DEN-IM.gitWorkflow based development
Component based development
Components are modular pieces with some basic rules:
Component A
- Input/Output
- Parameters
- Resources
Component B
- Input/Output
- Parameters
- Resources
With this framework, building workflows becomes simple:
flowcraft build -t 'trimmomatic fastqc spades pilon' -o my_nextflow_pipelineResults in the following workflow DAG (direct acyclic graph)
It's easy to get experimental:
flowcraft build -t 'trimmomatic fastqc skesa pilon' -o my_nextflow_pipelineSwitch spades for skesa
42h on 200 CPUs
151 samples
1812 assemblies
43s/assembly
Sampled assemblies
Dots above red line
Same sample interpreted with different profile
Potentially undetected outbreak
This work was funded by: FCT - "Fundação para a Ciência e a Tecnologia" (SFRH/BD/129483/2017)
Special thanks to Diogo Silva, Bruno Gonçalves, Tiago Jesus, Pedro Vila-Cerqueira, Rafael Maria Mamede, João Carriço, John Rossen and Mário Ramirez.