Challenges and opportunities of open and continuous community benchmarking
DMLS seminar,
Almut Lütge - Robinson group,
Zürich, 18.10.2022
Systematic comparison of methods/processes to understand underlying features and/or find the most suitable procedure for a specific task
Number of tools per year
https://www.scrna-tools.org/
Luecken et al., 2020
Systematic comparison of metrics to understand their performance and find the most suitable metric to evaluate batch correction
Aim: Test whether metrics scale with (synthetic) batch strength; Estimate lower limit of batch detection
Spearman correlation of metrics with the batch logFC in simulation series on the same dataset; Minimal batch logFC that is recognized from the metrics as batch effect
Aim: Negative control and test whether metrics scale with randomness
Spearman correlation of metrics with the percentage of randomly permuted batch label
Aim: Test whether metrics reflect batch strength across datasets
Spearman correlation of metrics with surrogates of batch strength (e.g., percent variance explained by batch (PVE-Batch) and proportion of DE genes between batches) across datasets
Aim: Reaction of metrics to imbalanced cell type abundance within the same dataset
Test sensitivity towards imbalance of cell type abundance
Luecken et al., 2020
Luecken et al., 2020
--> Open extensible community benchmarks
Meta-analysis of 62 method benchmarks in the field of single cell omics
62 single cell omics method benchmarks
2 reviewer per benchmark
Meta-analysis:
Title
Number of datasets used in evaluations:
Number of methods evaluated:
Degree to which authors are neutral:
...
22. Type of workflow system used:
independent harmonization of responses
summaries
available
extensible
reusable
neutral
community-driven
code
workflows
enviroments
software versions
static
continuous
input data
method results
simulations
performance results
currently part of most benchmarks
not part of current standards
Open and continuous community benchmarking
Method developer/
Benchmarker
Method user
Methods
Datasets
Metrics
Omnibenchmark
standardized datasets
= 1 "module" (renku project )
method results
metric results
interactive result exploration
Method user
Method developer/
Benchmarker
GitLab project
Docker container
Workflow
Datasets
Collection of
method* history,
description how to run it, comp. environment, datasets
=
pypy module for workflow and dataset management with renku/KG
CICD Orchestrator to automatically run and update benchmarks
Triplet store to perform cross repository queries
Shiny app to interactively explore results
Result
Code
Data
generated
used_by
used_by
Data
Code
Result
used_by
generated
Subject
predicate
Object
Module A
Module B
triplet generation
https://www.oecdbetterlifeindex.org
Robinson group
Mark Robinson
Anthony Sonrel
Izaskun Mallona
Pierre-Luc Germain
Renku team
Oksana Riba Grognuz
Friedrich Miescher institute
Charlotte Soneson
A data analysis platform/system built from a set of microservices
GitLab --> version control/CICD
Apache Jena --> Triple store
Jupyter server --> interactive sessions
Docker/Kubernetes --> software/enviroment management
GitLFS --> File storage
Dataset and workflow management system → “renku-python”
Knowledge graph tracking → provenance
User interface with free interactive sessions
GitLab
Result
Code
Data
generated
used_by
used_by
Data
Code
Result
used_by
generated
User interaction with renku client
Automatic triplet generation
Triplet store "Knowledge graph"
User interaction with renku client
KG-endpoint queries
Each benchmark has their own orchestrator
Schedules automatic module updates
"Gate-keeping" - controls addition of new modules