Benchmarking and methods development for single cell data
PhD defense
Almut Lütge
Zürich, 28.06.23
Development
Stem cell
Differentiated cells
Tissue
Development
Stem cell
Differentiated cells
Tissue
Disease
Cancer
Immune system
Transcription
Transcription
Translation
aatgctgcgctaatcgcgcgtatcgggatcatgccctagtggccccatattggcgtcaggtcgaacggatcttcggtgactccatgcattttcaggctcactgtggca
aatgctgcgctaatcgcgcgtatcgggatcatgccctagtggccccatattggcgtcaggtcgaacggatcttcggtgactccatgcattttcaggctcactgtggca
alignment, filtering, counting
aatgctgcgctaatcgcgcgtatcgggatcatgccctagtggccccatattggcgtcaggtcgaacggatcttcggtgactccatgcattttcaggctcactgtggca
alignment, filtering, counting
filtering,
QC, normalization
embedding
aatgctgcgctaatcgcgcgtatcgggatcatgccctagtggccccatattggcgtcaggtcgaacggatcttcggtgactccatgcattttcaggctcactgtggca
alignment, filtering, counting
filtering,
QC, normalization
embedding
clustering
trajectory
marker genes
differentiation
Differences between data sets [..][that] occur due to uncontrolled variability in experimental factors (Lun, 2019)
Differences between data sets [..][that] occur due to uncontrolled variability in experimental factors (Lun, 2019)
Differences between data sets [..][that] occur due to uncontrolled variability in experimental factors (Lun, 2019)
Batch 1
Batch 2
Batch 1
Batch 2
tsne1
tsne2
Batch 1
Batch 2
Batch 1
Batch 2
tsne1
tsne2
Batch 1
Batch 2
Batch 1
Batch 2
tsne1
tsne2
Batch 1
Batch 2
Batch 1
Batch 2
tsne1
tsne2
Batch 1
Batch 2
tsne1
tsne2
tsne1
tsne2
Batch 2
Batch 1
Common embedding
dim1
dim2
How to quantify batch effects? Comparison of different batch mixing metrics
Aim: Test whether metrics scale with (synthetic) batch strength; Estimate lower limit of batch detection
Spearman correlation of metrics with the batch logFC in simulation series on the same dataset; Minimal batch logFC that is recognized from the metrics as batch effect
Datasets
Datasets
Methods
Datasets
Methods
Metrics
Datasets
Methods
Metrics
1. cms_default 2. cms_kmin 3. lisi
1. cms_default 2. cms_kmin 3. lisi
Norel et al, 2011
Luecken et al., 2021
-
"benchmarking [..] is comparable to asking how good a baseball player is by testing how quickly he or she hits or runs under very controlled circumstances." Kasper Lage, 2020
"benchmarking [..] is comparable to asking how good a baseball player is by testing how quickly he or she hits or runs under very controlled circumstances." Kasper Lage, 2020
Robert Lewandowski
max. speed: 32.71 km/h
Timo Werner
max. speed: 34.1 km/h
"benchmarking [..] is comparable to asking how good a baseball player is by testing how quickly he or she hits or runs under very controlled circumstances." Kasper Lage, 2020
Robert Lewandowski
max. speed: 32.71 km/h
passes: 765
Timo Werner
max. speed: 34.1 km/h
passes: 660
"benchmarking [..] is comparable to asking how good a baseball player is by testing how quickly he or she hits or runs under very controlled circumstances." Kasper Lage, 2020
Robert Lewandowski
max. speed: 32.71 km/h
passes: 765
goals: 23
Timo Werner
max. speed: 34.1 km/h
passes: 660
goals: 9
Meta-analysis of 62 method benchmarks in the field of single cell omics
62 single cell omics method benchmarks
62 single cell omics method benchmarks
Meta-analysis:
Title
Number of datasets used in evaluations:
Number of methods evaluated:
Degree to which authors are neutral:
...
22. Type of workflow system used:
62 single cell omics method benchmarks
2 reviewer per benchmark
Meta-analysis:
Title
Number of datasets used in evaluations:
Number of methods evaluated:
Degree to which authors are neutral:
...
22. Type of workflow system used:
62 single cell omics method benchmarks
2 reviewer per benchmark
Meta-analysis:
Title
Number of datasets used in evaluations:
Number of methods evaluated:
Degree to which authors are neutral:
...
22. Type of workflow system used:
independent harmonization of responses
62 single cell omics method benchmarks
2 reviewer per benchmark
Meta-analysis:
Title
Number of datasets used in evaluations:
Number of methods evaluated:
Degree to which authors are neutral:
...
22. Type of workflow system used:
independent harmonization of responses
summaries
Code
available
extensible
reusable
currently part of most benchmarks
not part of current standards
Data
inputs
simulations
results
Code
available
extensible
reusable
currently part of most benchmarks
not part of current standards
Data
inputs
simulations
results
Reproducibility
versions
environments
workflows
Code
available
extensible
reusable
currently part of most benchmarks
not part of current standards
Data
inputs
simulations
results
Reproducibility
versions
environments
workflows
scale
comprehensive
continuous
Code
available
extensible
reusable
currently part of most benchmarks
not part of current standards
open, continuous and collaborative benchmarking
Method developer/
Benchmarker
Method user
Methods
Datasets
Metrics
Omnibenchmark
Goals:
standardized datasets
= 1 "module" (renku project )
standardized datasets
= 1 "module" (renku project )
method results
standardized datasets
= 1 "module" (renku project )
method results
metric results
standardized datasets
= 1 "module" (renku project )
method results
metric results
interactive result exploration
standardized datasets
= 1 "module" (renku project )
method results
metric results
interactive result exploration
Method user
Method developer/
Benchmarker
= 1 "data bundle" (data files + meta data)
= 1 "module" (renku project )
= 1 "data bundle" (data files + meta data)
= 1 "module" (renku project )
= 1 "data bundle" (data files + meta data)
= 1 "module" (renku project )
= 1 "data bundle" (data files + meta data)
= 1 "module" (renku project )
= 1 "data bundle" (data files + meta data)
= 1 "module" (renku project )
Orchestrator
https://www.oecdbetterlifeindex.org
Data
Reproducibility
scale
Code
part of omnibenchmark
not part of omnibenchmark
Data
Reproducibility
scale
Code
available
extensible
reusable
part of omnibenchmark
not part of omnibenchmark
Data
inputs
simulations
results
Reproducibility
scale
Code
available
extensible
reusable
part of omnibenchmark
not part of omnibenchmark
Data
inputs
simulations
results
Reproducibility
versions
environments
workflows
scale
Code
available
extensible
reusable
part of omnibenchmark
not part of omnibenchmark
Data
inputs
simulations
results
Reproducibility
versions
environments
workflows
scale
(comprehensive)
continuous
comparable
Code
available
extensible
reusable
part of omnibenchmark
not part of omnibenchmark
easy
flexible
easy
flexible
easy
flexible
easy
flexible
Against the ’one method fits all data sets’ philosophy
--> Dynamic, extensible and explorable benchmarking system
Result
Code
generated
used_by
has_attribute
keyword
has_attribute
keyword
Data
Code
Result
used_by
generated
User interaction with renku client
Automatic triplet generation
Triplet store "Knowledge graph"
User interaction with renku client
KG-endpoint queries
contributer
user
omnibenchmark-python
omniValidator
benchmarker
projects
templates
omb-site
{
orchestrator
triplestore
omni-sparql
dashboards
GitLab
Docker
Workflow
Module:
Template code
Module code
Data bundle
GitLFS/S3
Input/Output files
Metadata
tsne1
tsne2
Aim: Negative control and test whether metrics scale with randomness
Spearman correlation of metrics with the percentage of randomly permuted batch label
Aim: Test whether metrics reflect batch strength across datasets
Spearman correlation of metrics with surrogates of batch strength (e.g., percent variance explained by batch (PVE-Batch) and proportion of DE genes between batches) across datasets
Variance attribution
Batch DE genes
Aim: Reaction of metrics to imbalanced cell type abundance within the same dataset
Test sensitivity towards imbalance of cell type abundance
spectrometry
imaging
aatgctgcgctaatcgcgcgta
tcgggatcatgccctagtggcc
cgccatattggcgtcaggtcga
atcggatccggtgactccatgc
atttcaggctcactgtggcacc
sequencing
Luecken et al., 2021