Johannes Köster University of Duisburg-Essen https://koesterlab.github.io
dataset
results
dataset
dataset
dataset
dataset
dataset
portability
scalability
automation/
documentation
Automation/documentation:
document and execute all steps from raw data to final tables and figures without manual intervention
Scalability:
Execute for tens to thousands of datasets.
Efficiently use any computing platform.
Portability:
Easily execute analysis on a different system/platform/architecture by integrated deployment of the required software stack.
rule estimate_spike_proportion:
input:
"analysis/all.sce.rds"
output:
report("plots/spike-proportion.svg",
category="Quality control",
caption="report/spike-proportion.rst")
script:
"scripts/plot-spike-proportion.R"
General:
Automatic reports:
General:
Job groups:
rule bwa:
input:
"genome.fa"
"reads/{sample}.fastq"
output:
"mapped/{sample}.bam"
group: "mapping"
threads: 8
shell:
"bwa mem -t {threads} {input} | "
"samtools view -Sb - > {output}"
Pipe output:
rule bwa:
input:
"genome.fa"
"reads/{sample}.fastq"
output:
pipe("mapped/{sample}.bam")
threads: 8
shell:
"bwa mem -t {threads} {input} | "
"samtools view -Sb - > {output}"
Software deployment with Conda:
Software deployment with Singularity:
container: "docker://continuumio/miniconda3"
rule estimate_spike_proportion:
input:
"analysis/all.sce.rds"
output:
"plots/spike-proportion.svg"
conda:
"envs/r-qc.yaml"
script:
"scripts/plot-spike-proportion.R"
https://snakemake.github.io
A framework for reproducible and
transparent data analysis