Johannes Köster

2019

 

https://koesterlab.github.io

Get the slides: https://tinyurl.com/y9f6kc4h

Get the slides: https://tinyurl.com/y9f6kc4h

100k downloads since 2015

50k in 2018

Snakemake is a popular solution

Get the slides: https://tinyurl.com/y9f6kc4h

dataset

results

dataset

dataset

dataset

dataset

dataset

Data analysis

Get the slides: https://tinyurl.com/y9f6kc4h

dataset

results

dataset

dataset

dataset

dataset

dataset

automation

From raw data to final figures:

  • document parameters, tools, versions
  • execute without manual intervention

Reproducible data analysis

Get the slides: https://tinyurl.com/y9f6kc4h

dataset

results

dataset

dataset

dataset

dataset

dataset

scalability

Handle parallelization:

  • execute for tens to thousands of datasets
  • efficiently use any computing platform

automation

Reproducible data analysis

Get the slides: https://tinyurl.com/y9f6kc4h

dataset

results

dataset

dataset

dataset

dataset

dataset

Handle deployment:

be able to easily execute analyses on a different system/platform/infrastructure

portability

scalability

automation

Reproducible data analysis

Get the slides: https://tinyurl.com/y9f6kc4h

dataset

results

dataset

dataset

dataset

dataset

dataset

Define workflows

in terms of rules

Get the slides: https://tinyurl.com/y9f6kc4h

Define workflows

in terms of rules

Get the slides: https://tinyurl.com/y9f6kc4h

rule mytask:
    input:
        "path/to/{dataset}.txt"
    output:
        "result/{dataset}.txt"
    script:
        "scripts/myscript.R"


rule myfiltration:
     input:
        "result/{dataset}.txt"
     output:
        "result/{dataset}.filtered.txt"
     shell:
        "mycommand {input} > {output}"


rule aggregate:
    input:
        "results/dataset1.filtered.txt",
        "results/dataset2.filtered.txt"
    output:
        "plots/myplot.pdf"
    script:
        "scripts/myplot.R"

Define workflows

in terms of rules

Get the slides: https://tinyurl.com/y9f6kc4h

rule mytask:
    input:
        "path/to/{dataset}.txt"
    output:
        "result/{dataset}.txt"
    script:
        "scripts/myscript.R"


rule myfiltration:
     input:
        "result/{dataset}.txt"
     output:
        "result/{dataset}.filtered.txt"
     shell:
        "mycommand {input} > {output}"


rule aggregate:
    input:
        "results/dataset1.filtered.txt",
        "results/dataset2.filtered.txt"
    output:
        "plots/myplot.pdf"
    script:
        "scripts/myplot.R"

Define workflows

in terms of rules

Get the slides: https://tinyurl.com/y9f6kc4h

Live demo

Get the slides: https://tinyurl.com/y9f6kc4h

Resources

Get the slides: https://tinyurl.com/y9f6kc4h

Snakemake live demo

By Johannes Köster

Snakemake live demo

GCCBOSC 2018

  • 3,809