Omnibenchmark:

(Re-)design and future plans

Science talk

Zürich, 04.10.23

Omnibenchmark:

General design and concepts

GitLab

Docker

Workflow

Module:

Template code

Module code

Omnibenchmark components

Omnibenchmark design

Omnibenchmark:

Current limitations

Design Limitations

1. Limited project "cross-talk"

Design Limitations

1. Limited project "cross-talk"

Module:

project triples (renku ontology)

--> renku-python

Renku rest api

```
GET /knowledge-graph/entities
```

Allows finding projects, datasets, workflows, and persons

--> omb is "file-centric" (activity-centric)

GET /knowledge-graph/projects/:namespace/:name

Finds details of the project with the given namespace/name

Renku rest api

version

https://renkulab.io/knowledge-graph/version

scope and speed of queries

Why project "cross-talk"?

Why project "cross-talk"?

Why project "cross-talk"?

Why project "cross-talk"?

How to implicitly match files from the same origin dataset?

Why project "cross-talk"?

Lineages!

Lineages

fast and performant

--> renku ontology is not file-centric

Lineages

fast and performant

--> renku ontology is not file-centric

Lineages

ontology (Izaskun)

generate + send triples during benchmark execution (plugin, omni-cli, Ben)

queries (Izaskun)

adapt omnibenchmark-python (Almut)

integrate into orchestrator runs

Design Limitations

1. Limited project "cross-talk"

2. Missing concept of versions

Where do versions help?

Define scope (storage policies etc.)

Reduce search space in queries

Reproducibility/ Releases

Prevent race conditions

(Local benchmarks)

Epochs to version benchmarks

epoch == 1 (full?) orchestrator run

--> benchmarks are defined by the orchestrator (scope + order + epoch)

--> epoch as part of omni-ontology (cross-project communication)

--> orchestrator as .yaml file (local graph/epochs)

Design Limitations

1. Limited project "cross-talk"

2. Missing concept of versions

3. Renku dependency

Omnibenchmark-Renku Relationship

Renku is a set of microservices:

RENKU SERVICE	RELATIONSHIP STATUS
renku-python	complicated
renku graph	in separation
renku GUI	open relationship
renkuLab	long distance

Omnibenchmark-Renku strategy

reduce dependencies

increase interaction

Omnibenchmark-Renku strategy

reduce dependencies

increase interaction/communication

Omnibenchmark-Renku strategy

Summary

some OMB base changes in progress

OMB ontology, epoch concept

Reduce some renku dependency

user facing changes?

Design: Benchmark modules

Data

standardized datasets

= 1 "module" (renku project )

Methods

method results

Metrics

metric results

Dashboard

interactive result exploration

Method user

Method developer/

Benchmarker

The omnibenchmark ecosystem

omnibenchmark-python

Omnibenchmark Python

1. User module specification

# src/config.yaml
---
data:
    name: "out_dataset"
    title: "Output of an example OmniObject"
    description: "describe module here"
    keywords: ["example_dataset"]
script: "path/to/method/dataset/metric/script.py"
benchmark_name: "omni_celltype"
inputs:
    keywords: ["import_this", "import_that"]
    files: ["count_file", "dim_red_file"]
    prefix:
        count_file: "counts"
        dim_red_file: ["features", "genes"]
outputs:
    files:
        corrected_counts: 
            end: ".mtx.gz"
        meta:
            end: ".json"
parameter:
    names: ["param1", "param2"]
    keywords: ["param_dataset"]

Omnibenchmark Python

1. User module specification

# src/run_workflow.py
from omnibenchmark import get_omni_object_from_yaml, renku save

## Load config
omni_obj = get_omni_object_from_yaml('src/config.yaml')

## Check for new/updates of input datasets
omni_obj.update_object()
renku_save()

## Create output dataset
omni_obj.create_dataset()

## Generate and run workflow
omni_obj.run_renku()
renku_save()

## Store results in output dataset
omni_obj.update_result_dataset()
renku_save()

Omnibenchmark Python

1. User module specification

Omnibenchmark Python

2. Manage input import/updates

data:
    name: "iris_random_forest"
    title: "Random Forest"
    description: "Random forest applied on the iris dataset"
    keywords: ["iris_method"]
script: "src/iris-random-forest.R"
benchmark_name: "iris_example"
inputs:
    keywords: ["iris_filtered"] 
    files: ["iris_input"]  
    prefix:
        iris_input: "_filt_dataset"
outputs:
    template: "data/${name}/${name}_${unique_values}_${out_name}.${out_end}"  
    files:
        rf_model: 
             end: ".rds"
parameter:
    keywords: ["iris_parameters"] 
    names: ["train_split", "rseed"]

Example: Method module

Omnibenchmark Python

2. Manage input import/updates

Example: Method module

```
benchmark: iris_example
```
```
keyword: iris_filtered
```

"endpoints"

Omnibenchmark Python

2. Manage input import/updates

Example: Method module

```
benchmark: iris_example
```
```
keyword: iris_filtered
```

Omnibenchmark Python

2. Manage input import/updates

Example: Method module

```
benchmark: iris_example
```
```
keyword: iris_filtered
```

"endpoints"

Orchestrator

Omnibenchmark Python

2. Manage input import/updates

data:
    name: "iris_random_forest"
    title: "Random Forest"
    description: "Random forest applied on the iris dataset"
    keywords: ["iris_method"]
script: "src/iris-random-forest.R"
benchmark_name: "iris_example"
inputs:
    keywords: ["iris_filtered"] 
    files: ["iris_input"]  
    prefix:
        iris_input: "_filt_dataset"
outputs:
    template: "data/${name}/${name}_${unique_values}_${out_name}.${out_end}"  
    files:
        rf_model: 
             end: ".rds"
parameter:
    keywords: ["iris_parameters"] 
    names: ["train_split", "rseed"]

Example: Method module

Omnibenchmark Python

2. Generalize user specifications

data:
    name: "iris_random_forest"
    title: "Random Forest"
    description: "Random forest applied on the iris dataset"
    keywords: ["iris_method"]
script: "src/iris-random-forest.R"
benchmark_name: "iris_example"
inputs:
    keywords: ["iris_filtered"] 
    files: ["iris_input"]  
    prefix:
        iris_input: "_filt_dataset"
outputs:
    template: "data/${name}/${name}_${unique_values}_${out_name}.${out_end}"  
    files:
        rf_model: 
             end: ".rds"
parameter:
    keywords: ["iris_parameters"] 
    names: ["train_split", "rseed"]

Example: Method module

Omnibenchmark Python

3. Generalize user specifications

Example: Method module

iris-random-forest.R

*._filt_dataset*.

```
benchmark: iris_example
```
```
keyword: iris_filtered
```

Omnibenchmark Python

2. Generalize user specifications

The omnibenchmark ecosystem

omni-validator

omni-sparql

triple store

omni-cli

omni-utils

The omnibenchmark Infrastructure

omni ontology triples
project-specific renku ontology triples

GitLFS

File storage

imlsomnibenchmark

Version control (git)
CICD
UI
API

renkuLab

renku GUI

Interactive sessions
Authentication (?)
Queries
Boards(?)

labmeeting041023

By Almut Luetge

Omnibenchmark:

Omnibenchmark:

Omnibenchmark components

Omnibenchmark design

Omnibenchmark:

Design Limitations

Design Limitations

Renku rest api

Renku rest api

Why project "cross-talk"?

Why project "cross-talk"?

Why project "cross-talk"?

Why project "cross-talk"?

Why project "cross-talk"?

Why project "cross-talk"?

Why project "cross-talk"?

Why project "cross-talk"?

Lineages

Lineages

Lineages

Design Limitations

Where do versions help?

Epochs to version benchmarks

Design Limitations

Omnibenchmark-Renku Relationship

Omnibenchmark-Renku strategy

Omnibenchmark-Renku strategy

Omnibenchmark-Renku strategy

Summary

Design: Benchmark modules

Data

Methods

Metrics

Dashboard

The omnibenchmark ecosystem

Omnibenchmark Python

Omnibenchmark Python

Omnibenchmark Python

Omnibenchmark Python

Omnibenchmark Python

Omnibenchmark Python

Omnibenchmark Python

Omnibenchmark Python

Omnibenchmark Python

Omnibenchmark Python

Omnibenchmark Python

The omnibenchmark ecosystem

The omnibenchmark Infrastructure

labmeeting041023

More from Almut Luetge