Johannes Köster
University of Duisburg-Essen
2025
Reproducibility
Transparency
Adaptability
>1 million downloads since 2015
>3000 citations
>14 citations per week in 2024
dataset
results
dataset
dataset
dataset
dataset
dataset
rule mytask:
input:
"path/to/{dataset}.txt"
output:
"result/{dataset}.txt"
script:
"scripts/myscript.R"
rule myfiltration:
input:
"result/{dataset}.txt"
output:
"result/{dataset}.filtered.txt"
shell:
"mycommand {input} > {output}"
rule aggregate:
input:
"results/dataset1.filtered.txt",
"results/dataset2.filtered.txt"
output:
"plots/myplot.pdf"
script:
"scripts/myplot.R"
rule mytask:
input:
"data/{sample}.txt"
output:
"result/{sample}.txt"
shell:
"some-tool {input} > {output}"
rule name
how to create output from input
define
rule mytask:
input:
"path/to/{dataset}.txt"
output:
"result/{dataset}.txt"
script:
"scripts/myscript.R"
rule myfiltration:
input:
"result/{dataset}.txt"
output:
"result/{dataset}.filtered.txt"
shell:
"mycommand {input} > {output}"
rule aggregate:
input:
"results/dataset1.filtered.txt",
"results/dataset2.filtered.txt"
output:
"plots/myplot.pdf"
script:
"scripts/myplot.R"
rule mytask:
input:
"data/{sample}.txt"
output:
"result/{sample}.txt"
script:
"scripts/myscript.py"
reusable scripts:
import pandas as pd
data = pd.read_table(snakemake.input[0])
data = data.sort_values("id")
data.to_csv(snakemake.output[0], sep="\t")
Python:
data <- read.table(snakemake@input[[1]])
data <- data[order(data$id),]
write.table(data, file = snakemake@output[[1]])
R:
import polar as pl
pl.read_csv(&snakemake.input[0])
.sort()
.to_csv(&snakemake.output[0])
Rust:
rule map_reads:
input:
"{sample}.bam"
output:
"{sample}.sorted.bam"
wrapper:
"0.22.0/bio/samtools/sort"
reuseable wrappers from central repository
job selection
job resource usage
free resources
job temp file consumption
temp file lifetime fraction
job priority
job thread usage
temp file size
temp file deletion
--groups a=g1 b=g1
--groups a=g1 b=g1 --group-components g1=2
--groups a=g1 b=g1 --group-components g1=5
workstation
compute server
cluster
grid computing
cloud computing
rule mytask:
input:
"path/to/{dataset}.txt"
output:
"result/{dataset}.txt"
conda:
"envs/some-tool.yaml"
shell:
"some-tool {input} > {output}"
channels:
- conda-forge
dependencies:
- some-tool =2.3.1
- some-lib =1.1.2
rule mytask:
input:
"path/to/{dataset}.txt"
output:
"result/{dataset}.txt"
container:
"docker://biocontainers/some-tool#2.3.1"
shell:
"some-tool {input} > {output}"
rule chipseq_pipeline:
input:
input="design.csv",
fasta="data/genome.fasta",
gtf="data/genome.gtf",
# any --<argname> pipeline file arguments can be given here as <argname>=<path>
output:
report="results/multiqc/broadPeak/multiqc_report.html",
params:
pipeline="nf-core/chipseq",
revision="2.0.0",
profile=["test", "docker"],
outdir=subpath(output.report, ancestor=2),
# any --<argname> pipeline arguments can be given here as <argname>=<value>
handover: True
wrapper:
"v7.2.0/utils/nextflow"
from dataclasses import dataclass, field
from snakemake_interface_common.exceptions import WorkflowError
from snakemake_interface_report_plugins.reporter import ReporterBase
from snakemake_interface_report_plugins.settings import ReportSettingsBase
# Optional:
# Define additional settings for your reporter.
# They will occur in the Snakemake CLI as --report-<reporter-name>-<param-name>
# Omit this class if you don't need any.
# Make sure that all defined fields are Optional (or bool) and specify a default value
# of None (or False) or anything else that makes sense in your case.
@dataclass
class ReportSettings(ReportSettingsBase):
myparam: Optional[int] = field(
default=None,
metadata={
"help": "Some help text",
# Optionally request that setting is also available for specification
# via an environment variable. The variable will be named automatically as
# SNAKEMAKE_REPORT_<reporter-name>_<param-name>, all upper case.
# This mechanism should ONLY be used for passwords and usernames.
# For other items, we rather recommend to let people use a profile
# for setting defaults
# (https://snakemake.readthedocs.io/en/stable/executing/cli.html#profiles).
"env_var": False,
# Optionally specify a function that parses the value given by the user.
# This is useful to create complex types from the user input.
"parse_func": ...,
# If a parse_func is specified, you also have to specify an unparse_func
# that converts the parsed value back to a string.
"unparse_func": ...,
# Optionally specify that setting is required when the reporter is in use.
"required": True,
# Optionally specify multiple args with "nargs": True
},
)
# Required:
# Implementation of your reporter
class Reporter(ReporterBase):
def __post_init__(self):
# initialize additional attributes
# Do not overwrite the __init__ method as this is kept in control of the base
# class in order to simplify the update process.
# In particular, the settings of above ReportSettings class are accessible via
# self.settings.
def render(self):
# Render the report, using attributes of the base class.
...
snakemake --default-resources --jobs 100
mem_mb="min(max(2*input.size_mb, 1000), 8000)"
disk_mb="max(2*input.size_mb, 1000) if input else 50000"
snakemake --default-resources mem_mb=... --jobs 100
Specifiy defaults:
Use builtin defaults:
Store in profile:
# /etc/xdg/snakemake/default/config.yaml:
default-resources:
mem_mb: "min(max(2*input.size_mb, 1000), 8000)"
...
Define per rule:
# profiles/default/config.yaml:
set-resources:
mytask:
mem_mb: 16000
...
Snakemake covers all aspects of fully reproducible, transparent, and adaptable data analysis, offering
https://snakemake.github.io
Join the hackathon at TU Munich in March 2026!