(Re-)design and future plans
Science talk
Zürich, 04.10.23
General design and concepts
GitLab
Docker
Workflow
Module:
Template code
Module code
Current limitations
1. Limited project "cross-talk"
1. Limited project "cross-talk"
GET /knowledge-graph/entities
Allows finding projects
, datasets
, workflows
, and persons
--> omb is "file-centric" (activity-centric)
GET /knowledge-graph/projects/:namespace/:name
Finds details of the project with the given namespace/name
version
scope and speed of queries
How to implicitly match files from the same origin dataset?
Lineages!
fast and performant
--> renku ontology is not file-centric
fast and performant
--> renku ontology is not file-centric
ontology (Izaskun)
generate + send triples during benchmark execution (plugin, omni-cli, Ben)
queries (Izaskun)
adapt omnibenchmark-python (Almut)
integrate into orchestrator runs
1. Limited project "cross-talk"
2. Missing concept of versions
epoch == 1 (full?) orchestrator run
--> benchmarks are defined by the orchestrator (scope + order + epoch)
--> epoch as part of omni-ontology (cross-project communication)
--> orchestrator as .yaml file (local graph/epochs)
1. Limited project "cross-talk"
2. Missing concept of versions
3. Renku dependency
Renku is a set of microservices:
RENKU SERVICE | RELATIONSHIP STATUS |
---|---|
renku-python | complicated |
renku graph | in separation |
renku GUI | open relationship |
renkuLab | long distance |
standardized datasets
= 1 "module" (renku project )
method results
metric results
interactive result exploration
Method user
Method developer/
Benchmarker
1. User module specification
# src/config.yaml
---
data:
name: "out_dataset"
title: "Output of an example OmniObject"
description: "describe module here"
keywords: ["example_dataset"]
script: "path/to/method/dataset/metric/script.py"
benchmark_name: "omni_celltype"
inputs:
keywords: ["import_this", "import_that"]
files: ["count_file", "dim_red_file"]
prefix:
count_file: "counts"
dim_red_file: ["features", "genes"]
outputs:
files:
corrected_counts:
end: ".mtx.gz"
meta:
end: ".json"
parameter:
names: ["param1", "param2"]
keywords: ["param_dataset"]
1. User module specification
# src/run_workflow.py
from omnibenchmark import get_omni_object_from_yaml, renku save
## Load config
omni_obj = get_omni_object_from_yaml('src/config.yaml')
## Check for new/updates of input datasets
omni_obj.update_object()
renku_save()
## Create output dataset
omni_obj.create_dataset()
## Generate and run workflow
omni_obj.run_renku()
renku_save()
## Store results in output dataset
omni_obj.update_result_dataset()
renku_save()
1. User module specification
2. Manage input import/updates
data:
name: "iris_random_forest"
title: "Random Forest"
description: "Random forest applied on the iris dataset"
keywords: ["iris_method"]
script: "src/iris-random-forest.R"
benchmark_name: "iris_example"
inputs:
keywords: ["iris_filtered"]
files: ["iris_input"]
prefix:
iris_input: "_filt_dataset"
outputs:
template: "data/${name}/${name}_${unique_values}_${out_name}.${out_end}"
files:
rf_model:
end: ".rds"
parameter:
keywords: ["iris_parameters"]
names: ["train_split", "rseed"]
Example: Method module
2. Manage input import/updates
Example: Method module
benchmark: iris_example
keyword: iris_filtered
"endpoints"
1.
2. Manage input import/updates
Example: Method module
benchmark: iris_example
keyword: iris_filtered
2. Manage input import/updates
Example: Method module
benchmark: iris_example
keyword: iris_filtered
"endpoints"
1.
2.
Orchestrator
2. Manage input import/updates
data:
name: "iris_random_forest"
title: "Random Forest"
description: "Random forest applied on the iris dataset"
keywords: ["iris_method"]
script: "src/iris-random-forest.R"
benchmark_name: "iris_example"
inputs:
keywords: ["iris_filtered"]
files: ["iris_input"]
prefix:
iris_input: "_filt_dataset"
outputs:
template: "data/${name}/${name}_${unique_values}_${out_name}.${out_end}"
files:
rf_model:
end: ".rds"
parameter:
keywords: ["iris_parameters"]
names: ["train_split", "rseed"]
Example: Method module
2. Generalize user specifications
data:
name: "iris_random_forest"
title: "Random Forest"
description: "Random forest applied on the iris dataset"
keywords: ["iris_method"]
script: "src/iris-random-forest.R"
benchmark_name: "iris_example"
inputs:
keywords: ["iris_filtered"]
files: ["iris_input"]
prefix:
iris_input: "_filt_dataset"
outputs:
template: "data/${name}/${name}_${unique_values}_${out_name}.${out_end}"
files:
rf_model:
end: ".rds"
parameter:
keywords: ["iris_parameters"]
names: ["train_split", "rseed"]
Example: Method module
3. Generalize user specifications
Example: Method module
iris-random-forest.R
*._filt_dataset*.
benchmark: iris_example
keyword: iris_filtered
2. Generalize user specifications
omni-validator
omni-sparql
triple store
omni-cli
omni-utils
GitLFS
File storage
?
imlsomnibenchmark
renkuLab
renku GUI