Johannes Köster
2015
Analyses usually entail the application of various tools, algorithms and scripts.
Workflow management
handles boilerplate:
Snakemake infers dependencies and execution order.
text based:
Python + domain specific syntax
Decompose workflow into rules.
Rules define how to obtain output files from input files.
~1000 downloads per week
rule sort:
input:
"path/to/dataset.txt"
output:
"dataset.sorted.txt"
shell:
"sort {input} > {output}"
rule name
refer to input and output from shell command
how to create output from input
rule sort:
input:
"path/to/{dataset}.txt"
output:
"{dataset}.sorted.txt"
shell:
"sort {input} > {output}"
generalize rules with
named wildcards
Disjoint paths in the DAG of jobs can be executed in parallel.
Rules can be annotated with arbitrary resources that can be used to constrain the scheduling.
rule samtools_sort:
input:
"mapped/{sample}.bam"
output:
"mapped/{sample}.sorted.bam"
params:
"-m 4G"
threads: 8
wrapper:
"0.0.8/bio/samtools_sort"
Wrappers
rule bwa:
input:
"genome.fasta",
"{sample}.fastq"
output:
"{sample}.bam"
requirements:
"bwa ==0.7.12",
"samtools ==1.1.0"
shell:
"bwa mem {input} | "
"samtools view - > {output}"
Conda support
DAG partitioning