GCCBOSC 2018
Johannes Köster
2018
https://koesterlab.github.io
Poster B20
66k downloads on Bioconda
Poster B20
rule mytask:
input:
"data/{sample}.txt"
output:
"result/{sample}.txt"
shell:
"some-tool {input} > {output}"
Concise DSL
Poster B20
rule mytask:
input:
"data/{sample}.txt"
output:
"result/{sample}.txt"
script:
"scripts/mytask.py"
Python scripts
Poster B20
rule mytask:
input:
"data/{sample}.txt"
output:
"result/{sample}.txt"
script:
"scripts/mytask.R"
R scripts
Poster B20
import matplotlib.pyplot as plt
import pandas as pd
d = pd.read_table(snakemake.input[0])
d.hist(bins=snakemake.config["hist-bins"])
plt.savefig(snakemake.output[0])
No boilerplate
Poster B20
rule mytask:
input:
"data/{sample}.txt"
output:
"result/{sample}.txt"
script:
"scripts/mytask.py"
rule mytask:
input:
"data/{sample}.txt"
output:
"result/{sample}.txt"
wrapper:
"0.24.0/bio/mytool"
Reusable tool wrappers
Poster B20
rule mytask:
input:
"data/{sample}.txt"
output:
"result/{sample}.txt"
cwl:
"https://github.com/some/cwl-tool"
CWL tools
Poster B20
rule mytask:
input:
"path/to/{dataset}.txt"
output:
"result/{dataset}.txt"
script:
"scripts/myscript.R"
rule myfiltration:
input:
"result/{dataset}.txt"
output:
"result/{dataset}.filtered.txt"
shell:
"mycommand {input} > {output}"
rule aggregate:
input:
"results/dataset1.filtered.txt",
"results/dataset2.filtered.txt"
output:
"plots/myplot.pdf"
script:
"scripts/myplot.R"
Implicit dependencies
Poster B20
workstation
compute server
cluster
grid computing
cloud computing
Scalability
Poster B20
rule mytask:
input:
"path/to/{dataset}.txt"
output:
"result/{dataset}.txt"
conda:
"envs/mycommand.yaml"
shell:
"mycommand {input} > {output}"
channels:
- bioconda
- conda-forge
dependencies:
-mycommand =2.3.1
Conda integration
Poster B20
rule mytask:
input:
"path/to/{dataset}.txt"
output:
"result/{dataset}.txt"
singularity:
"docker://some/container"
shell:
"mycommand {input} > {output}"
Singularity integration
Poster B20
rule mytask:
input:
"path/to/{dataset}.txt"
output:
"result/{dataset}.txt"
conda:
"envs/mycommand.yaml"
singularity:
"docker://some/os"
shell:
"mycommand {input} > {output}"
Singularity + Conda
Poster B20
dataset
results
dataset
dataset
dataset
dataset
dataset
portability
scalability
automation/ documentation
Poster B20
Snakemake BoF today 8pm
Snakemake lightning talk
By Johannes Köster
Snakemake lightning talk
GCCBOSC 2018
- 1,773