omnibenchmark-python
Science talk -17.05.2022
omnibenchmark-python
Science talk -17.05.2022
Technology
Disclaimer
Disclaimer
Disclaimer
Disclaimer
Aims of this presentation:
Discuss main concepts of the omnibenchmark-python module
Showcase parts of the module
Discuss goals/priorities of the project/module
Discuss bottlenecks/ main (remaining) obstacles
Discuss module/package design
# Renku
A platform integrating:
git, Jupyter/RStudio, docker, analysis workflows linked with a knowledge graph
Renku client
-
Dataset and workflow management system → “renku-python”
-
Knowledge graph tracking → provenance
Renkulab
-
User interface with free interactive sessions
-
GitLab
# Renku
A data analysis platform/system built from a set of microservices
GitLab --> version control/CICD
Apache Jena --> Triple store
Jupyter server --> interactive sessions
Docker/Kubernetes --> software/enviroment management
GitLFS --> File storage
# Renku
Knowledge graph
Result
Code
Data
generated
used_by
used_by
Data
Code
Result
used_by
generated
User interaction with renku client
Automatic triplet generation
Triplet store "Knowledge graph"
User interaction with renku client
KG-endpoint queries
# Renku
KG-endpoints
Dataset/Project API
Code: curl requests
How: id/keyword based
Limitations: selected fields, no workflows
Sparql/Graphql
Code: spqrql/graphql
How: nodes/edges
Limitations: slow! selected triplets (workflows)
Project graph
Code: renku client
How: renku ontology
Limitations: access from project-only!
# omnibenchmark-python
Motivation
renku client
interactive dataset/workflow managment
#!/usr/bin/env bash
renku dataset import -y https://renkulab.io/datasets/2114d84f245e46f493bbda944fbe11ab
# omnibenchmark-python
Motivation
#!/usr/bin/env bash
curl -s "https://renkulab.io/knowledge-graph/datasets?query=test_dataset1" | \
jq -r '.[] | select(.keywords[] | ascii_downcase == ('\""test_dataset"\"')) | .identifier'
2114d84f245e46f493bbda944fbe11ab
ffa44150b88347c9a0f1f14ad43d204a
b50f0b54765045c8baa7b675c37a0206
?
- Which version is the latest?
- Which project is the "origin" project?
- Does the pipeline run correctly?
- Is the dataset part of omnibenchmark?
- ...
package/module
# omnibenchmark-python
Structure
config.yaml
---
data:
name: "out_dataset"
title: "example omniobject output"
description: "..."
keywords: ["example"]
script: "path/to/method/dataset/metric/script.py"
interpreter: "python"
command_line: "python path/to/script.py ..."
inputs:
keywords: ["import_this"]
files: ["count_file", "dim_red_file"]
...
outputs:
template: "data/${name}/${name}_${out_name}.${out_end}"
files:
...
parameter:
...
import omnibenchmark as omni
ex_bench = omni.utils.get_omni_object_from_yaml("config.yaml")
ex_bench.__class__
<class 'omnibenchmark.core.omni_object.OmniObject'>
# omnibenchmark-python
Structure
import omnibenchmark as omni
ex_bench = omni.utils.get_omni_object_from_yaml("config.yaml")
ex_bench.__class__
<class 'omnibenchmark.core.omni_object.OmniObject'>
ex_bench.__dict__
# omnibenchmark-python
Structure
import omnibenchmark as omni
ex_bench = omni.utils.get_omni_object_from_yaml("config.yaml")
ex_bench.__class__
<class 'omnibenchmark.core.omni_object.OmniObject'>
ex_bench.__dict__
{
'logger': <Logger omnibenchmark.OmniObject (WARNING)>,
'name': 'out_dataset',
'keyword': ['example'],
'title': 'example omniobject output',
'description': 'This dataset is supposed to store the output files from the example omniobject',
'command': <omnibenchmark.core.output_classes.OmniCommand object at 0x7fe8b538bf40>,
'inputs': <omnibenchmark.core.input_classes.OmniInput object at 0x7fe8b54c9100>,
'outputs': <omnibenchmark.core.output_classes.OmniOutput object at 0x7fe8b538bee0>,
'parameter': <omnibenchmark.core.input_classes.OmniParameter object at 0x7fe8b538bf10>,
'script': 'path/to/method/dataset/metric/script.py',
'omni_plan': None,
'renku': True,
'kg_url':
'https://renkulab.io/knowledge-graph'
}
# omnibenchmark-python
Structure
from omnibenchmark.core.omni_object import OmniObject
dir(OmniObject)
# omnibenchmark-python
Structure
from omnibenchmark.core.omni_object import OmniObject
dir(OmniObject)
['DATA_QUERY_URL', 'DATA_URL', 'GIT_API', 'GIT_URL', 'GRAPHQL_URL', 'KG_URL', 'RENKU_URL',
'__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__',
'__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__',
'__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
'__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__',
'create_dataset', 'run_renku', 'run_script', 'update_object']
# omnibenchmark-python
Example
import omnibenchmark as omni
ex_bench = omni.utils.get_omni_object_from_yaml("config.yaml")
ex_bench.__class__
<class 'omnibenchmark.core.omni_object.OmniObject'>
# omnibenchmark-python
Goals/Priorities
omnicli list_datasets --benchmark "test"
NAME SIZE URL
-------------- ------ --------------------------------------------------------------
test_dataset1 57 MB https://renkulab.io/datasets/2114d84f245e46f493bbda944fbe11ab
test_dataset2 120 MB https://renkulab.io/datasets/ffa44150b88347c9a0f1f14ad43d204a
omnicli download_datasets --all/--include URL
omnicli omnicli run_method --use_docker=yes --method='xy' --file_input='my_input_test' --benchmark='test'
Automatic dataset import
Parameter filtering
Object updates
CLI
pypy module
Dashboards
Triple store integration
Short term
Long term
# omnibenchmark-python
Bottlenecks
# omnibenchmark-python
Python modules
OOP: Object oriented programming
# omnibenchmark-python
Python modules
Pytest/monkeypatch
# Test manage_renku_plan
def test_manage_renku_plan_with_correct_plan(mock_plan, monkeypatch):
def return_mock_plan(*args, **kwargs):
return mock_plan
monkeypatch.setattr(
wflow,
"check_plan_exist",
return_mock_plan,
)
mock_omni_plan = OmniPlan(plan=mock_plan)
out_plan = omni.manage_renku_plan(
out_files=["any", "random"],
omni_plan=mock_omni_plan,
command="not_to_run_command_str",
)
assert out_plan == mock_omni_plan
# omnibenchmark-python
Python modules
mypy
class ConfigDict(TypedDict, total=False):
data: ConfigData
script: Optional[str]
interpreter: Optional[str]
command_line: Optional[str]
inputs: Optional[ConfigInput]
outputs: Optional[ConfigOutput]
parameter: Optional[ConfigParam]
def get_omni_object_from_yaml(yaml_file: PathLike) -> OmniObject:
with open(yaml_file) as f:
config = yaml.load(f, Loader=yaml.FullLoader)
check_type("config", config, ConfigDict)
return build_omni_object_from_config(config)
# Summary
Summary
omnibenchmark-python:
- to automatize working with renku(-python)
-
to flexibly interact with omnibenchmark (and renku-client)
-
not to replace renku CLI
To-Dos:
- omni CLI
-
integrate triplet store
-
add to pypy
Copy of Omnibenchmark-python
By Almut Luetge
Copy of Omnibenchmark-python
- 73