Johannes Köster
2019
https://koesterlab.github.io
Get the slides: https://tinyurl.com/y9f6kc4h
Live demo: https://tinyurl.com/ya2mxvku
![](https://s3.amazonaws.com/media-p.slid.es/uploads/362168/images/2765493/snakemake-paper.png)
Get the slides: https://tinyurl.com/y9f6kc4h
Live demo: https://tinyurl.com/ya2mxvku
100k downloads since 2015
50k in 2018
![](https://s3.amazonaws.com/media-p.slid.es/uploads/362168/images/2646163/nature_genetics.gif)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/362168/images/2646162/cancer_cell.jpg)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/362168/images/2646179/nature_methods.gif)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/362168/images/2646177/genome_biology.jpg)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/362168/images/2646170/bioinformatics.gif)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/362168/images/2646184/cell.jpg)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/362168/images/2646193/molecular_cell.jpg)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/362168/images/2646200/plos_genetics.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/362168/images/5293194/ng50_7.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/362168/images/5293229/nar_46_12.gif)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/362168/images/5293228/isme_12_8.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/362168/images/5293223/nm_3_2.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/362168/images/5293204/nar_46_14.gif)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/362168/images/5293277/msb_13_5.jpg)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/362168/images/5293330/mbe_34_5.jpg)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/362168/images/5293331/pnas_114_5.jpg)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/362168/images/5293364/nbt-v34-n5.png)
Snakemake is a popular solution
![](https://s3.amazonaws.com/media-p.slid.es/uploads/362168/images/5293426/molcel_66_3.gif)
Get the slides: https://tinyurl.com/y9f6kc4h
Live demo: https://tinyurl.com/ya2mxvku
dataset
results
dataset
dataset
dataset
dataset
dataset
Data analysis
Get the slides: https://tinyurl.com/y9f6kc4h
Live demo: https://tinyurl.com/ya2mxvku
dataset
results
dataset
dataset
dataset
dataset
dataset
automation
From raw data to final figures:
- document parameters, tools, versions
- execute without manual intervention
Reproducible data analysis
Get the slides: https://tinyurl.com/y9f6kc4h
Live demo: https://tinyurl.com/ya2mxvku
dataset
results
dataset
dataset
dataset
dataset
dataset
scalability
Handle parallelization:
- execute for tens to thousands of datasets
- efficiently use any computing platform
automation
Reproducible data analysis
Get the slides: https://tinyurl.com/y9f6kc4h
Live demo: https://tinyurl.com/ya2mxvku
dataset
results
dataset
dataset
dataset
dataset
dataset
Handle deployment:
be able to easily execute analyses on a different system/platform/infrastructure
portability
scalability
automation
Reproducible data analysis
Get the slides: https://tinyurl.com/y9f6kc4h
Live demo: https://tinyurl.com/ya2mxvku
dataset
results
dataset
dataset
dataset
dataset
dataset
Define workflows
in terms of rules
Get the slides: https://tinyurl.com/y9f6kc4h
Live demo: https://tinyurl.com/ya2mxvku
Define workflows
in terms of rules
Get the slides: https://tinyurl.com/y9f6kc4h
Live demo: https://tinyurl.com/ya2mxvku
rule mytask:
input:
"path/to/{dataset}.txt"
output:
"result/{dataset}.txt"
script:
"scripts/myscript.R"
rule myfiltration:
input:
"result/{dataset}.txt"
output:
"result/{dataset}.filtered.txt"
shell:
"mycommand {input} > {output}"
rule aggregate:
input:
"results/dataset1.filtered.txt",
"results/dataset2.filtered.txt"
output:
"plots/myplot.pdf"
script:
"scripts/myplot.R"
Define workflows
in terms of rules
Get the slides: https://tinyurl.com/y9f6kc4h
Live demo: https://tinyurl.com/ya2mxvku
rule mytask:
input:
"path/to/{dataset}.txt"
output:
"result/{dataset}.txt"
script:
"scripts/myscript.R"
rule myfiltration:
input:
"result/{dataset}.txt"
output:
"result/{dataset}.filtered.txt"
shell:
"mycommand {input} > {output}"
rule aggregate:
input:
"results/dataset1.filtered.txt",
"results/dataset2.filtered.txt"
output:
"plots/myplot.pdf"
script:
"scripts/myplot.R"
Define workflows
in terms of rules
Get the slides: https://tinyurl.com/y9f6kc4h
Live demo: https://tinyurl.com/ya2mxvku
Live demo
Get the slides: https://tinyurl.com/y9f6kc4h
Live demo: https://tinyurl.com/ya2mxvku
Resources
Homepage:
https://snakemake.readthedocs.io
Tutorial:
https://snakemake.readthedocs.io/en/stable/tutorial/tutorial.html
Live demo:
https://www.katacoda.com/johanneskoester/scenarios/snakemake-intro
Best-practice workflows:
Get the slides: https://tinyurl.com/y9f6kc4h
Live demo: https://tinyurl.com/ya2mxvku
Snakemake live demo
By Johannes Köster
Snakemake live demo
GCCBOSC 2018
- 3,682