GCCBOSC 2018

Johannes Köster

2018

 

https://koesterlab.github.io

Poster B20

66k downloads on Bioconda

Poster B20

rule mytask:
    input:
        "data/{sample}.txt"
    output:
        "result/{sample}.txt"
    shell:
        "some-tool {input} > {output}"

Concise DSL

Poster B20

rule mytask:
    input:
        "data/{sample}.txt"
    output:
        "result/{sample}.txt"
    script:
        "scripts/mytask.py"

Python scripts

Poster B20

rule mytask:
    input:
        "data/{sample}.txt"
    output:
        "result/{sample}.txt"
    script:
        "scripts/mytask.R"

R scripts

Poster B20

import matplotlib.pyplot as plt
import pandas as pd

d = pd.read_table(snakemake.input[0])

d.hist(bins=snakemake.config["hist-bins"])

plt.savefig(snakemake.output[0])

No boilerplate

Poster B20

rule mytask:
    input:
        "data/{sample}.txt"
    output:
        "result/{sample}.txt"
    script:
        "scripts/mytask.py"
rule mytask:
    input:
        "data/{sample}.txt"
    output:
        "result/{sample}.txt"
    wrapper:
        "0.24.0/bio/mytool"

Reusable tool wrappers

Poster B20

rule mytask:
    input:
        "data/{sample}.txt"
    output:
        "result/{sample}.txt"
    cwl:
        "https://github.com/some/cwl-tool"

CWL tools

Poster B20

rule mytask:
    input:
        "path/to/{dataset}.txt"
    output:
        "result/{dataset}.txt"
    script:
        "scripts/myscript.R"


rule myfiltration:
     input:
        "result/{dataset}.txt"
     output:
        "result/{dataset}.filtered.txt"
     shell:
        "mycommand {input} > {output}"


rule aggregate:
    input:
        "results/dataset1.filtered.txt",
        "results/dataset2.filtered.txt"
    output:
        "plots/myplot.pdf"
    script:
        "scripts/myplot.R"

Implicit dependencies

Poster B20

workstation

compute server

cluster

grid computing

cloud computing

Scalability

Poster B20

rule mytask:
    input:
        "path/to/{dataset}.txt"
    output:
        "result/{dataset}.txt"
    conda:
        "envs/mycommand.yaml"
    shell:
        "mycommand {input} > {output}"
channels:
  - bioconda
  - conda-forge
dependencies:
  -mycommand =2.3.1

Conda integration

Poster B20

rule mytask:
    input:
        "path/to/{dataset}.txt"
    output:
        "result/{dataset}.txt"
    singularity:
        "docker://some/container"
    shell:
        "mycommand {input} > {output}"

Singularity integration

Poster B20

rule mytask:
    input:
        "path/to/{dataset}.txt"
    output:
        "result/{dataset}.txt"
    conda:
        "envs/mycommand.yaml"
    singularity:
        "docker://some/os"
    shell:
        "mycommand {input} > {output}"

Singularity + Conda

Poster B20

dataset

results

dataset

dataset

dataset

dataset

dataset

portability

scalability

automation/ documentation

Poster B20

Snakemake BoF today 8pm

Snakemake lightning talk

By Johannes Köster

Snakemake lightning talk

GCCBOSC 2018

  • 1,669