Johannes Köster
2019
https://koesterlab.github.io
Get the slides: https://tinyurl.com/y9f6kc4h
Live demo: https://tinyurl.com/ya2mxvku
Get the slides: https://tinyurl.com/y9f6kc4h
Live demo: https://tinyurl.com/ya2mxvku
100k downloads since 2015
50k in 2018
Snakemake is a popular solution
Get the slides: https://tinyurl.com/y9f6kc4h
Live demo: https://tinyurl.com/ya2mxvku
dataset
results
dataset
dataset
dataset
dataset
dataset
Data analysis
Get the slides: https://tinyurl.com/y9f6kc4h
Live demo: https://tinyurl.com/ya2mxvku
dataset
results
dataset
dataset
dataset
dataset
dataset
automation
From raw data to final figures:
- document parameters, tools, versions
- execute without manual intervention
Reproducible data analysis
Get the slides: https://tinyurl.com/y9f6kc4h
Live demo: https://tinyurl.com/ya2mxvku
dataset
results
dataset
dataset
dataset
dataset
dataset
scalability
Handle parallelization:
- execute for tens to thousands of datasets
- efficiently use any computing platform
automation
Reproducible data analysis
Get the slides: https://tinyurl.com/y9f6kc4h
Live demo: https://tinyurl.com/ya2mxvku
dataset
results
dataset
dataset
dataset
dataset
dataset
Handle deployment:
be able to easily execute analyses on a different system/platform/infrastructure
portability
scalability
automation
Reproducible data analysis
Get the slides: https://tinyurl.com/y9f6kc4h
Live demo: https://tinyurl.com/ya2mxvku
dataset
results
dataset
dataset
dataset
dataset
dataset
Define workflows
in terms of rules
Get the slides: https://tinyurl.com/y9f6kc4h
Live demo: https://tinyurl.com/ya2mxvku
Define workflows
in terms of rules
Get the slides: https://tinyurl.com/y9f6kc4h
Live demo: https://tinyurl.com/ya2mxvku
rule mytask:
input:
"path/to/{dataset}.txt"
output:
"result/{dataset}.txt"
script:
"scripts/myscript.R"
rule myfiltration:
input:
"result/{dataset}.txt"
output:
"result/{dataset}.filtered.txt"
shell:
"mycommand {input} > {output}"
rule aggregate:
input:
"results/dataset1.filtered.txt",
"results/dataset2.filtered.txt"
output:
"plots/myplot.pdf"
script:
"scripts/myplot.R"
Define workflows
in terms of rules
Get the slides: https://tinyurl.com/y9f6kc4h
Live demo: https://tinyurl.com/ya2mxvku
rule mytask:
input:
"path/to/{dataset}.txt"
output:
"result/{dataset}.txt"
script:
"scripts/myscript.R"
rule myfiltration:
input:
"result/{dataset}.txt"
output:
"result/{dataset}.filtered.txt"
shell:
"mycommand {input} > {output}"
rule aggregate:
input:
"results/dataset1.filtered.txt",
"results/dataset2.filtered.txt"
output:
"plots/myplot.pdf"
script:
"scripts/myplot.R"
Define workflows
in terms of rules
Get the slides: https://tinyurl.com/y9f6kc4h
Live demo: https://tinyurl.com/ya2mxvku
Live demo
Get the slides: https://tinyurl.com/y9f6kc4h
Live demo: https://tinyurl.com/ya2mxvku
Resources
Homepage:
https://snakemake.readthedocs.io
Tutorial:
https://snakemake.readthedocs.io/en/stable/tutorial/tutorial.html
Live demo:
https://www.katacoda.com/johanneskoester/scenarios/snakemake-intro
Best-practice workflows:
Get the slides: https://tinyurl.com/y9f6kc4h
Live demo: https://tinyurl.com/ya2mxvku
Snakemake live demo
By Johannes Köster
Snakemake live demo
GCCBOSC 2018
- 3,795