How to contribute a project?
Zürich,
14. & 16.12.22
Method developer/
Benchmarker
Method user
Methods
Datasets
Metrics
Omnibenchmark
Benchmarker
contributor
GitLab
Workflow
Docker container
projects
orchestrator
Dataset
Dataset
Dataset
contributer
user
omnibenchmark-python
omniValidator
benchmarker
projects
templates
omb-site
{
orchestrator
triplestore
omni-sparql
dashboards
Each project is an independent module, that can interact with the remaining benchmark
flexible language, dependencies, code, ...
needs some general and benchmark specific rules
Hackathon namespace ?
Attribution!
Public
Template source repo
Available modules for any step!
Load
Branch name
Keyword that will be used by downstream projects to import this project
Important fields:
Should be the same for all components of your benchmark!
These information can always be changed later on
Jupyterlab or Rstudio session
Wait for the image to build
Make sure you are on the latest commit
project > new session
Always turn on
Configuration of your docker image, R packages, Py modules, etc.
Metadata to run the project
Scripts to insert your code
To modify:
Only file to run. Will use `config.yaml` and your script.
To run:
Step-by-step guide in the working document
Base image
Install R dependencies
Install python dependencies
install.R
requirements.txt
install optparse
Install omnibenchmark python
---
data:
name: "out_dataset"
title: "Output of an example OmniObject"
description: "describe module here"
keywords: ["example_dataset"]
script: "path/to/method/dataset/metric/script.py"
benchmark_name: "omni_celltype"
inputs:
keywords: ["import_this", "import_that"]
files: ["count_file", "dim_red_file"]
prefix:
count_file: "counts"
dim_red_file: ["features", "genes"]
outputs:
files:
corrected_counts:
end: ".mtx.gz"
meta:
end: ".json"
parameter:
names: ["param1", "param2"]
keywords: ["param_dataset"]
filter:
param1:
upper: 50
lower: 3
exclude: 12
file: "path/to/param/combinations/to/exclude.json"
src/config.yaml
General module infos
Input types
output types
### ----------------------------------------------------------------- ###
## ------------------------- Options ------------------------------- ##
### ----------------------------------------------------------------- ###
library(optparse)
option_list <- list(
make_option(c("--count_file"), type = "character", default = NULL,
help = "Path to count file (cells as columns, genes as rows)"),
make_option(c("--meta_file"), type = "character", default = NULL,
help = "Path to file with meta data infos")
);
opt_parser <- OptionParser(option_list = option_list);
opt <- parse_args(opt_parser);
if (is.null(opt$count_file) || is.null(opt$meta_file)) {
print_help(opt_parser)
stop("Please specify a path to the counts and meta data file.",
call. = FALSE)
}
count_file <- opt$count_file
meta_file <- opt$meta_file
method.R
Input types
output types
from omnibenchmark import get_omni_object_from_yaml, renku save
## Load config
omni_obj = get_omni_object_from_yaml('src/config.yaml')
## Check for new/updates of input datasets
omni_obj.update_object()
renku_save()
## Create output dataset
omni_obj.create_dataset()
## Generate and run workflow
omni_obj.run_renku()
renku_save()
## Store results in output dataset
omni_obj.update_result_dataset()
renku_save()
from omnibenchmark.utils.build_omni_object import get_omni_object_from_yaml
from omnibenchmark.renku_commands.general import renku_save
## Load config
omni_obj = get_omni_object_from_yaml('src/config.yaml')
## Update object
omni_obj.update_object()
renku_save()
## Create output dataset
omni_obj.create_dataset()
## Run workflow
omni_obj.run_renku()
renku_save()
## Update Output dataset
omni_obj.update_result_dataset()
renku_save()
run_workflow.py
build object
from omnibenchmark.utils.build_omni_object import get_omni_object_from_yaml
from omnibenchmark.renku_commands.general import renku_save
## Load config
omni_obj = get_omni_object_from_yaml('src/config.yaml')
Warning: Skipped parameter because of missing information. Please specify at least names.
WARNING: No input files in the current project detected.
Run OmniObject.update_object() to update and import input datasets.
Otherwise check the specified prefixes: {'count_file': ['_counts'], 'meta_file': ['_meta']}
WARNING: No outputs/output file mapping in the current project detected.
Run OmniObject.update_object() to update outputs, inputs and parameter.
Currently this object can not be used to generate a workflow.
run_workflow.py
update object
from omnibenchmark.utils.build_omni_object import get_omni_object_from_yaml
from omnibenchmark.renku_commands.general import renku_save
## Load config
omni_obj = get_omni_object_from_yaml('src/config.yaml')
## Update object
omni_obj.update_object()
renku_save()
run_workflow.py
check object
## Check object
omni_obj.__dict__
{'name': 'filter_hvg',
'keyword': ['omni_clustering_filter'],
'title': 'Filtering based on the most variable genes.',
'description': '(Normalized) counts and reduced dimensions of filtered omni_clustering datasets, based on the top 10% most variable genes.',
'command': <omnibenchmark.core.output_classes.OmniCommand at 0x7f3478402820>,
'inputs': <omnibenchmark.core.input_classes.OmniInput at 0x7f3450edcb20>,
'outputs': <omnibenchmark.core.output_classes.OmniOutput at 0x7f3450ef5910>,
'parameter': None,
'script': 'src/filter_hvg.R',
'omni_plan': None,
'orchestrator': 'https://renkulab.io/knowledge-graph/projects/omnibenchmark/omni_clustering/orchestrator-clustering',
'benchmark_name': 'omni_clustering',
'wflow_name': 'filter_hvg',
'dataset_name': 'filter_hvg',
'renku': True,
'kg_url': 'https://renkulab.io/knowledge-graph'}
run_workflow.py
output dataset
## Create output dataset
omni_obj.create_dataset()
Renku dataset with name filter_hvg and the following keywords ['omni_clustering_filter'] was generated.
<Dataset c98a1b7958c6463bbea9fc2e314c6190 filter_hvg>
run_workflow.py
run workflow
## Run workflow
omni_obj.run_renku(all=False)
Loading required package: SummarizedExperiment
Loading required package: MatrixGenerics
Loading required package: matrixStats
...
run_workflow.py
Stdout & Stderr --> console
update output dataset
## Update Output dataset
omni_obj.update_result_dataset()
renku_save()
run_workflow.py
Add metadata files to output dataset
from omnibenchmark.management.data_commands import update_dataset_files
### Update Output dataset
meta_files = [files["meta_file"] for files in omni_obj.inputs.input_files.values()]
update_dataset_files(urls=meta_files, dataset_name=omni_obj.name)
Added the following files to filter_hvg:
['data/koh/koh_meta.json', 'data/kumar/kumar_meta.json']
run_workflow.py
dirty repository
DirtyRepository: The repository is dirty. Please use the "git" command to clean it.
On branch master
Your branch is up to date with 'origin/master'.
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: src/filter_hvg.R
no changes added to commit (use "git add" and/or "git commit -a")
Once you have added the untracked files, commit them with "git commit".
Error
renku save
from omnibenchmark.renku_commands.general import renku_save
renku_save()
parameter
parameter:
names: ["order", "hvg", "k"]
keywords: ["omni_batch_parameter"]
default:
order: "auto"
hvg: "sig"
k: 20
filter:
k:
upper: 30
lower: 5
config.yaml
parameter
parameter:
names: ["param1", "param2"]
values:
param1: [10, 20]
param2: ["test", "another_test"]
combinations:
comb1:
param1: 10
param2: "test"
comb2:
param1: 20
param2: "another_test"
config.yaml
Open an issue at the corresponding orchestrator
Recipe to run omnibenchmark from the CLI
Get started
Help
Overview
Overview
Method developer/
Benchmarker
Method user
Methods
Datasets
Metrics
Omnibenchmark
standardized datasets
= 1 "module" (renku project )
method results
metric results
interactive result exploration
Method user
Method developer/
Benchmarker
Workflows and activities
standardized datasets
= 1 "module" (renku project )
method results
metric results
interactive result exploration
Method user
Method developer/
Benchmarker
Method developer/
Benchmarker
Method user
Methods
Datasets
Metrics
Omnibenchmark
GitLab project
Docker container
Workflow
Datasets
Collection of
method* history,
description how to run it, comp. environment, datasets
=
Input files
output files
make --> Makefile / snakemake --> snakefile
- define the tasks to be executed
- define the order of tasks
- define dependencies between tasks
--> user writes the "recipe" to a workflow file
renku run --name word_count -- wc < src/config.yaml > wc_config.txt
renku tracks a command as workflow:
- renku writes a recipe of what has been done
- translates the plan into triples
- tracks the plan and how it is used in form of triples
Input files
output files
Result
Code
Data
generated
used_by
used_by
Data
Code
Result
used_by
generated
User interaction with renku client
Automatic triplet generation
Triplet store "Knowledge graph"
User interaction with renku client
KG-endpoint queries
make/snakemake --> make/snakemake
- generates all outputs that are not existing/uptodate
--> one command to automatically update and generate workflows without unnecessary reruns
renku run --name word_count -- wc < src/config.yaml > wc_config.txt
renku update --all
renku rerun wc_config.txt
renku workflow execute --set input-1="README.md" \
--set output-2="wc_readme.txt" \
word_count
from omnibenchmark import get_omni_object_from_yaml, renku_save
## Load config
omni_obj = get_omni_object_from_yaml('src/config.yaml')
## Update object
omni_obj.update_object()
renku_save()
## Create output dataset
omni_obj.create_dataset()
## Run workflow
omni_obj.run_renku()
renku_save()
## Update Output dataset
omni_obj.update_result_dataset()
renku_save()
Update and News
Science talk
Almut Lütge
30.11.2022
Omnibenchmark website
- presentation of our new site
- set up
- feedback
Benchmark/Hackathon results
- bettr batch correction
- overview
get_data.py
Gitlab-API
Triple store queries
datasets.qmd, projects.md,..
omnibenchmark.org
datasets.html,
project.html, ..
variables:
HUGO_ENV: production
stages:
- build
- update
- render
image_build:
stage: build
image: docker:stable
services:
- docker:dind
script:
- echo $CI_REGISTRY_PASSWORD | docker login -u $CI_REGISTRY_USER $CI_REGISTRY \
--password-stdin
- docker build -t $CI_REGISTRY_IMAGE .
- docker push $CI_REGISTRY_IMAGE
update_data:
stage: update
image:
name: $CI_REGISTRY_IMAGE:latest
before_script:
- git config --global user.name ${GITLAB_USER_NAME}
- git config --global user.email ${GITLAB_USER_EMAIL}
- url_host=$(git remote get-url origin | sed -e "s/https:\/\/gitlab-ci-token:.*@//g")
- git remote set-url --push origin "https://oauth2:${CI_PUSH_TOKEN}@${url_host}"
script:
- export GIT_PYTHON_REFRESH=quiet
- python3 src/data_gen/get_projects.py
- quarto render
- git status && git add . && git commit -m "Build site" && git status
- git push --follow-tags origin HEAD:$CI_COMMIT_REF_NAME -o ci.skip
test_render:
stage: render
image: registry.gitlab.com/pages/hugo/hugo_extended:latest
before_script:
- apk add --update --no-cache git go
- hugo mod init gitlab.com/pages/hugo
script:
- hugo
rules:
- if: $CI_COMMIT_REF_NAME != $CI_DEFAULT_BRANCH
pages:
stage: render
image: registry.gitlab.com/pages/hugo/hugo_extended:latest
before_script:
- apk add --update --no-cache git go
- hugo mod init gitlab.com/pages/hugo
script:
- hugo
artifacts:
paths:
- public
rules:
- if: $CI_COMMIT_REF_NAME == $CI_DEFAULT_BRANCH
Summarized metrics - Food for thought
Current work
Almut Lütge
01.03.2023
from scib_metrics.benchmark import Benchmarker
bm = Benchmarker(
adata,
batch_key="batch",
label_key="cell_type",
embedding_obsm_keys=["Unintegrated", "Scanorama", "LIGER", "Harmony"],
n_jobs=4,
)
bm.benchmark()
bm.plot_results_table()
bm.plot_results_table(min_max_scale=False)
No scaling:
Min-max scaling:
No scaling:
Min-max scaling:
Larger range in results:
--> higher sensitivity
--> lower overall range
bm.plot_results_table()
bm.plot_results_table(min_max_scale=False)
No scaling:
Min-max scaling:
bm.get_results_table()
Summaries
To consider