Johannes Köster
2026
University of Duisburg-Essen
Reproducibility
Transparency
Adaptability
>1.6 million downloads since 2015
>3000 citations
>14 citations per week in 2024
dataset
results
dataset
dataset
dataset
dataset
dataset
rule mytask:
input:
"path/to/{dataset}.txt"
output:
"result/{dataset}.txt"
script:
"scripts/myscript.R"
rule myfiltration:
input:
"result/{dataset}.txt"
output:
"result/{dataset}.filtered.txt"
shell:
"mycommand {input} > {output}"
rule aggregate:
input:
"results/dataset1.filtered.txt",
"results/dataset2.filtered.txt"
output:
"plots/myplot.pdf"
script:
"scripts/myplot.R"
rule mytask:
input:
"data/{sample}.txt"
output:
"result/{sample}.txt"
shell:
"some-tool {input} > {output}"
rule name
how to create output from input
define
rule mytask:
input:
"path/to/{dataset}.txt"
output:
"result/{dataset}.txt"
script:
"scripts/myscript.R"
rule myfiltration:
input:
"result/{dataset}.txt"
output:
"result/{dataset}.filtered.txt"
shell:
"mycommand {input} > {output}"
rule aggregate:
input:
"results/dataset1.filtered.txt",
"results/dataset2.filtered.txt"
output:
"plots/myplot.pdf"
script:
"scripts/myplot.R"
rule mytask:
input:
"data/{sample}.txt"
output:
"result/{sample}.txt"
script:
"scripts/myscript.py"
reusable scripts:
rule map_reads:
input:
"{sample}.bam"
output:
"{sample}.sorted.bam"
wrapper:
"0.22.0/bio/samtools/sort"
reuseable wrappers from central repository
https://datavzrd.github.io
oncoprint + individual variant calls
differentially expressed genes + expression matrix
https://datavzrd.github.io
oncoprint + individual variant calls +
differentially expressed genes + expression matrix +
https://datavzrd.github.io
Individual tables (tsv, excel) and plots:
Web applications (custom, shiny, ...):
https://datavzrd.github.io
Input:
Output:
portable interactive visual presentation
https://datavzrd.github.io
https://datavzrd.github.io
https://datavzrd.github.io
https://datavzrd.github.io
rule datavzrd:
input:
config="resources/{sample}.datavzrd.yaml",
table="data/{sample}.tsv",
output:
report(
directory("results/datavzrd-report/{sample}"),
htmlindex="index.html",
),
wrapper:
"v4.6.0/utils/datavzrd"https://datavzrd.github.io
harmonize
https://vembrane.github.io
(not {'risk_factor','pathogenic', 'drug_response'}.isdisjoint(ANN['CLIN_SIG'])) and
(ANN['IMPACT'] in {'LOW', 'MODERATE', 'HIGH'}) and
(ANN['REVEL'] is NA or ANN['REVEL'] >= 0.5)pathogenic/risk-factor/drug-response (vembrane filter)
relevant VAFs (vembrane filter)
(FORMAT['AF']['tumor'] >= 0.05 and (ANN['gnomADg_AF'] is NA or ANN['gnomADg_AF'] < 0.01))desc(quantize(max(FORMAT['AF'][sample] for sample in SAMPLES), 1 / 3)), \
desc(ANN["REVEL"]), \
desc(FORMAT['AF']['tumor'])sort by descending relevance (vembrane sort)
Adaptable and unified small-variant/structural-variant/fusion calling on any scenario
reference pipeline project
samples:
jane:
sex: female
somatic-effective-mutation-rate: 1e-10
tumor:
inheritance:
clonal:
from: jane
contamination:
by: jane
fraction: 0.1
somatic-effective-mutation-rate: 1e-6
relapse:
inheritance:
clonal:
from: jane
contamination:
by: jane
fraction: 0.2
somatic-effective-mutation-rate: 1e-6expressions:
somatic_tumor: "jane:0.0 & tumor:]0.0,1.0]"
events:
germline: "jane:0.5 | jane:1.0"
somatic: "jane:]0.0,0.5["
somatic_tumor_no_increase: "$somatic_tumor & l2fc(relapse,tumor) < 1"
somatic_tumor_increase: "$somatic_tumor & l2fc(relapse,tumor) >= 1"
somatic_relapse: "jane:0.0 & tumor:0.0 & relapse:]0.0,1.0]"https://varlociraptor.github.io
Snakemake + Datavzrd
generic, transparent, human readable, adaptable, portable, and scalable data analysis and visualization
Vembrane
Unified DSL for filtering, sorting and transforming genomic variants/fusions/CNVs
Varlociraptor + Pipeline
unified, adaptable small/structural variant calling for any scenario