Johannes Köster
>1 million downloads since 2015
>3000 citations
>14 citations per week in 2023
affiliated
Mölder, F., Jablonski, K.P., Letcher, B., Hall, M.B., Tomkins-Tinch, C.H., Sochat, V., Forster, J., Lee, S., Twardziok, S.O., Kanitz, A., Wilm, A., Holtgrewe, M., Rahmann, S., Nahnsen, S., Köster, J. Sustainable data analysis with Snakemake. F1000Res 10, 33 (2021).
https://snakemake.github.io
dataset
results
dataset
dataset
dataset
dataset
dataset
https://snakemake.github.io
https://snakemake.github.io
rule mytask:
input:
"path/to/{dataset}.txt"
output:
"result/{dataset}.txt"
script:
"scripts/myscript.R"
rule myfiltration:
input:
"result/{dataset}.txt"
output:
"result/{dataset}.filtered.txt"
shell:
"mycommand {input} > {output}"
rule aggregate:
input:
"results/dataset1.filtered.txt",
"results/dataset2.filtered.txt"
output:
"plots/myplot.pdf"
script:
"scripts/myplot.R"
https://snakemake.github.io
rule mytask:
input:
"data/{sample}.txt"
output:
"result/{sample}.txt"
params:
someparam=0.1
conda:
"envs/sometool.yaml"
shell:
"some-tool {params.someparam} {input} > {output}"
https://snakemake.github.io
deployment via
conda, containers
rule mytask:
input:
"path/to/{dataset}.txt"
output:
"result/{dataset}.txt"
script:
"scripts/myscript.R"
rule myfiltration:
input:
"result/{dataset}.txt"
output:
"result/{dataset}.filtered.txt"
shell:
"mycommand {input} > {output}"
rule aggregate:
input:
"results/dataset1.filtered.txt",
"results/dataset2.filtered.txt"
output:
"plots/myplot.pdf"
script:
"scripts/myplot.R"
https://snakemake.github.io
workstation
compute server
cluster
grid computing
cloud computing
https://snakemake.github.io
https://snakemake.github.io
https://snakemake.github.io
Automatic documentation
https://snakemake.github.io
https://datavzrd.github.io
oncoprint + individual variant calls
differentially expressed genes + expression matrix
https://datavzrd.github.io
oncoprint + individual variant calls +
differentially expressed genes + expression matrix +
https://datavzrd.github.io
Individual tables (tsv, excel) and plots:
Web applications (custom, shiny, ...):
https://datavzrd.github.io
Input:
Output:
portable interactive visual presentation
https://datavzrd.github.io
datasets:
oscars:
path: "data/oscars.csv"
links:
link to oscar plot:
column: age
view: oscar-plot
link to movie:
column: movie
table-row: movies/Title
movies:
path: "data/movies.csv"
links:
link to oscar entry:
column: Title
table-row: oscars/movie
views:
oscars:
dataset: oscars
desc: |
### All winning oscars beginning in the year 1929.
This table contains *all* winning oscars for best
actress and best actor.
page-size: 25
render-table:
columns:
age:
plot:
ticks:
scale: linear
domain:
- 20
- 100
name:
link-to-url: "https://lmgtfy.app/?q=Is {name} in {movie}?"
movie:
link-to-url: "https://de.wikipedia.org/wiki/{value}"
award:
plot:
heatmap:
scale: ordinal
domain:
- Best actor
- Best actress
range:
- "#add8e6"
- "#ffb6c1"
index(0):
display-mode: hidden
regex('birth_.+'):
display-mode: detail
movies:
dataset: movies
render-table:
columns:
Genre:
ellipsis: 15
imdbID:
link-to-url: "https://www.imdb.com/title/{value}/"
Title:
link-to-url: "https://de.wikipedia.org/wiki/{value}"
imdbRating:
precision: 1
plot:
bars:
scale: linear
domain:
- 1
- 10
Rated:
plot-view-legend: true
plot:
heatmap:
scale: ordinal
color-scheme: accent
oscar-plot:
dataset: oscars
desc: |
## My beautiful oscar scatter plot
*So many great actors and actresses*
render-plot:
spec-path: ".examples/specs/oscars.vl.json"
movies-plot:
dataset: movies
desc: |
All movies with its *runtime* and *ratings* plotted over *time*.
render-plot:
spec-path: ".examples/specs/movies.vl.json"
https://datavzrd.github.io
https://datavzrd.github.io
├── index.html
├── movies
│ ├── index_1.html
│ └── table.js
├── oscars
│ ├── index_1.html
│ └── table.js
├── movies-plot
│ └── index_1.html
├── oscar-plot
│ └── index_1.html
└── static
├── bootstrap.bundle.min.js
├── bootstrap.min.css
├── bootstrap-select.min.css
├── bootstrap-select.min.js
├── bootstrap-table-filter-control.min.js
├── bootstrap-table-fixed-columns.min.css
├── bootstrap-table-fixed-columns.min.js
├── bootstrap-table.min.css
├── bootstrap-table.min.js
├── datavzrd.css
├── jquery.min.js
├── jsonm.min.js
├── lz-string.min.js
├── showdown.min.js
├── vega-embed.min.js
├── vega-lite.min.js
└── vega.min.js
https://datavzrd.github.io
rule datavzrd:
input:
config="resources/{sample}.datavzrd.yaml",
table="data/{sample}.tsv",
output:
report(
directory("results/datavzrd-report/{sample}"),
htmlindex="index.html",
),
wrapper:
"v4.6.0/utils/datavzrd"
https://datavzrd.github.io
https://datavzrd.github.io
Johannes Köster, Louis Dijkstra, Tobias Marschall, Alexander Schönhuth.
Varlociraptor: enhancing sensitivity and controlling false discovery rate in somatic indel discovery. Genome Biology 21, 2020
https://varlociraptor.github.io
Traditional variant calling approach:
detection
genotyping/VAF prediction
Varlociraptor approach:
candidate variant calling
event calling
same tool
any tool
Varlociraptor
Johannes Köster, Louis Dijkstra, Tobias Marschall, Alexander Schönhuth.
Varlociraptor: enhancing sensitivity and controlling false discovery rate in somatic indel discovery. Genome Biology 21, 2020
https://varlociraptor.github.io
germline: "normal:0.5 | normal:1.0"
somatic_normal: "normal:]0.0,0.5["
somatic_tumor: "normal:0.0 & tumor:]0.0,1.0]"
absent: "normal:0.0 & tumor:0.0
https://varlociraptor.github.io
\(\xi_i \sim \text{Bernoulli}(\theta \tau)\)
\(\omega_i \sim Bernoulli(\pi_i)\)
\(Z_i \mid \xi_i, \omega_i=1,\beta,\delta \sim\)
\(\beta, \delta\)
https://varlociraptor.github.io
species:
heterozygosity: 0.001
ploidy:
male:
all: 2
X: 1
Y: 1
female:
all: 2
X: 2
Y: 0
samples:
jane:
sex: female
events:
present: "jane:0.5 | jane:1.0"
https://varlociraptor.github.io
species:
heterozygosity: 0.001
ploidy:
male:
all: 2
X: 1
Y: 1
female:
all: 2
X: 2
Y: 0
samples:
jane:
sex: female
john:
sex: male
james:
sex: male
inheritance:
mendelian:
mother: jane
father: john
events:
john: "john:0.5 | john:1.0"
jane: "jane:0.5 | jane:1.0"
denovo_james: "(james:0.5 | james:1.0) & !$jane & !$john"
https://varlociraptor.github.io
species:
heterozygosity: 0.001
ploidy:
male:
all: 2
X: 1
Y: 1
female:
all: 2
X: 2
Y: 0
samples:
jane:
sex: female
somatic-effective-mutation-rate: 1e-10
tumor:
inheritance:
clonal:
from: jane
contamination:
by: jane
fraction: 0.1
somatic-effective-mutation-rate: 1e-6
events:
germline: "jane:0.5 | jane:1.0"
somatic: "jane:]0.0,0.5["
somatic_tumor_low: "jane:0.0 & tumor:]0.0,0.1["
somatic_tumor_high: "jane:0.0 & tumor:[0.1,1.0]"
https://varlociraptor.github.io
samples:
jane:
sex: female
somatic-effective-mutation-rate: 1e-10
tumor:
inheritance:
clonal:
from: jane
contamination:
by: jane
fraction: 0.1
somatic-effective-mutation-rate: 1e-6
relapse:
inheritance:
clonal:
from: jane
contamination:
by: jane
fraction: 0.2
somatic-effective-mutation-rate: 1e-6
expressions:
somatic_tumor: "jane:0.0 & tumor:]0.0,1.0]"
events:
germline: "jane:0.5 | jane:1.0"
somatic: "jane:]0.0,0.5["
somatic_tumor_no_increase: "$somatic_tumor & l2fc(relapse,tumor) < 1"
somatic_tumor_increase: "$somatic_tumor & l2fc(relapse,tumor) >= 1"
somatic_relapse: "jane:0.0 & tumor:0.0 & relapse:]0.0,1.0]"
https://varlociraptor.github.io
species:
heterozygosity: 0.001
ploidy:
male:
all: 2
X: 1
Y: 1
female:
all: 2
X: 2
Y: 0
samples:
dna_illumina:
sex: female
dna_nanopore:
inheritance:
clonal:
from: dna_illumina
rna_illumina:
universe: [0.0,1.0]
events:
het: "dna_illumina:0.5 & dna_nanopore:0.5 & rna_illumina:]0.0,1.0]"
hom: "dna_illumina:1.0 & dna_nanopore:1.0 & rna_illumina:1.0"
rna_editing: "dna_illumina:0.0 & dna_nanopore:0.0 & rna_illumina:]0.0,1.0]"
https://varlociraptor.github.io
https://varlociraptor.github.io
Discovering nucleotide level and structural variants in cancer genome data from second and third generation sequencing technologies. Till Hartmann, PhD thesis 2024
varlociraptor call variants --testcase-prefix testcase --testcase-locus CHROM:POS generic \
--scenario scenario.yaml --obs tumor=tumor.bcf normal=normal.bcf
Automatic test case generation:
145
public testcases (simulated + real benchmarks)
66
private testcases
(clinical)
https://varlociraptor.github.io
https://varlociraptor.github.io
Haplotype quantification:
utilizing Varlociraptor's posterior allele frequency distributions for virus variant quantification and HLA typing
Methylation calling:
joint consideration of methylation and variation, any sequencing technology
CNV calling:
joint consideration of depth and alignment evidence for arbitray scenarios
Optical mapping evidence:
extension of statistical model to consider optical mapping label positions to determine SV allele likelihoods
Phased variant impact prediction:
leveraging matching between Varlociraptor observations and variants
Fusion calling:
thorough statistical approach for fusion calling on DNA, RNA, short and long reads
Koesterlab
Laura Kühle
David Lähnemann
Felix Mölder
Adrian Prinz
Hamdiye Uzuner
Felix Wiegand
Can Özkan
Andrea Tonk
Till Hartmann (alumn.)
Snakemake Community
Christian Meesters
Michael B. Hall
Filipe G. Vieira
Morten E. Lund
Vanessa Sochat
Alexaner Kanitz
Kim-Phillip Jablonski
Brice Letcher
Michael B. Hall
Chris Tomkins-Tinch
Sven O. Twardziok
Manuel Holtgrewe
+ hundreds of contributors
Others
Sven Rahmann
Alex Schönhuth
Shirley Liu
Myles Brown
Henry Long
Louis Dijkstra
Tobias Marschall
Marcel Martin
Sven Nahnsen
Alex Schramm
Ina Pretzell
Michael Wessolly
Thomas Herold
Martin Schuler
And thanks to all the users and contributors