omnibenchmark-python

Science talk -17.05.2022

omnibenchmark-python

Science talk -17.05.2022

Technology

Disclaimer

Disclaimer

Disclaimer

Disclaimer

Aims of this presentation:

 Discuss main concepts of the omnibenchmark-python module

    Showcase parts of the module

      Discuss goals/priorities of the project/module

    Discuss bottlenecks/ main (remaining) obstacles

    Discuss module/package design

# Renku

A platform integrating:

git, Jupyter/RStudio, docker, analysis workflows linked with a knowledge graph

Renku client

  • Dataset and workflow management system             → “renku-python

  • Knowledge graph tracking    → provenance

Renkulab

  • User interface with free interactive sessions

  • GitLab

# Renku

A data analysis platform/system built from a set of microservices

GitLab --> version control/CICD

Apache Jena --> Triple store

Jupyter server --> interactive sessions

Docker/Kubernetes --> software/enviroment management

GitLFS --> File storage

# Renku

Knowledge graph

Result

Code

Data

generated

used_by

used_by

Data

Code

Result

used_by

generated

User interaction with renku client

Automatic triplet generation

Triplet store "Knowledge graph"

User interaction with renku client

KG-endpoint queries

# Renku

KG-endpoints

Dataset/Project API

Code: curl requests

How:  id/keyword based

Limitations: selected fields, no workflows

Sparql/Graphql

Code: spqrql/graphql

How: nodes/edges

Limitations: slow! selected triplets (workflows)

Project graph

Code: renku client

How:  renku ontology

Limitations: access from project-only!

# omnibenchmark-python

Motivation

renku client

interactive dataset/workflow managment

#!/usr/bin/env bash
renku dataset import -y https://renkulab.io/datasets/2114d84f245e46f493bbda944fbe11ab
# omnibenchmark-python

Motivation

#!/usr/bin/env bash
curl -s "https://renkulab.io/knowledge-graph/datasets?query=test_dataset1" | \
	jq -r '.[] | select(.keywords[] | ascii_downcase == ('\""test_dataset"\"')) | .identifier'
2114d84f245e46f493bbda944fbe11ab
ffa44150b88347c9a0f1f14ad43d204a
b50f0b54765045c8baa7b675c37a0206

?

  • Which version is the latest?
  • Which project is the "origin" project?
  • Does the pipeline run correctly?
  • Is the dataset part of omnibenchmark?
  • ...

package/module

# omnibenchmark-python

Structure

config.yaml
---
data:
    name: "out_dataset"
    title: "example omniobject output"
    description: "..."
    keywords: ["example"]
script: "path/to/method/dataset/metric/script.py"
interpreter: "python"
command_line: "python path/to/script.py ..."
inputs:
  keywords: ["import_this"]
  files: ["count_file", "dim_red_file"]
  ...
outputs:
  template: "data/${name}/${name}_${out_name}.${out_end}"
  files:
  ...
parameter:
  ...
import omnibenchmark as omni

ex_bench = omni.utils.get_omni_object_from_yaml("config.yaml")
ex_bench.__class__
<class 'omnibenchmark.core.omni_object.OmniObject'>
# omnibenchmark-python

Structure

import omnibenchmark as omni

ex_bench = omni.utils.get_omni_object_from_yaml("config.yaml")
ex_bench.__class__
<class 'omnibenchmark.core.omni_object.OmniObject'>
ex_bench.__dict__
# omnibenchmark-python

Structure

import omnibenchmark as omni

ex_bench = omni.utils.get_omni_object_from_yaml("config.yaml")
ex_bench.__class__
<class 'omnibenchmark.core.omni_object.OmniObject'>
ex_bench.__dict__
{
    'logger': <Logger omnibenchmark.OmniObject (WARNING)>, 
    'name': 'out_dataset', 
    'keyword': ['example'], 
    'title': 'example omniobject output', 
    'description': 'This dataset is supposed to store the output files from the example omniobject', 
    'command': <omnibenchmark.core.output_classes.OmniCommand object at 0x7fe8b538bf40>, 
    'inputs': <omnibenchmark.core.input_classes.OmniInput object at 0x7fe8b54c9100>, 
    'outputs': <omnibenchmark.core.output_classes.OmniOutput object at 0x7fe8b538bee0>, 
    'parameter': <omnibenchmark.core.input_classes.OmniParameter object at 0x7fe8b538bf10>, 
    'script': 'path/to/method/dataset/metric/script.py', 
    'omni_plan': None, 
    'renku': True, 
    'kg_url': 
    'https://renkulab.io/knowledge-graph'
 }
# omnibenchmark-python

Structure

from omnibenchmark.core.omni_object import OmniObject

dir(OmniObject)
# omnibenchmark-python

Structure

from omnibenchmark.core.omni_object import OmniObject

dir(OmniObject)

['DATA_QUERY_URL', 'DATA_URL', 'GIT_API', 'GIT_URL', 'GRAPHQL_URL', 'KG_URL', 'RENKU_URL', 
'__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', 
'__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', 
'__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', 
'__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 
'create_dataset', 'run_renku', 'run_script', 'update_object']
# omnibenchmark-python

Example

import omnibenchmark as omni

ex_bench = omni.utils.get_omni_object_from_yaml("config.yaml")
ex_bench.__class__
<class 'omnibenchmark.core.omni_object.OmniObject'>
# omnibenchmark-python

Goals/Priorities

omnicli list_datasets --benchmark "test" 
NAME             SIZE    URL                                								
--------------  ------	 --------------------------------------------------------------
test_dataset1    57 MB   https://renkulab.io/datasets/2114d84f245e46f493bbda944fbe11ab 
test_dataset2   120 MB   https://renkulab.io/datasets/ffa44150b88347c9a0f1f14ad43d204a

omnicli download_datasets --all/--include URL
omnicli omnicli run_method --use_docker=yes --method='xy' --file_input='my_input_test' --benchmark='test'

Automatic dataset import

Parameter filtering

Object updates

CLI

pypy module

Dashboards

Triple store integration

Short term

Long term

# omnibenchmark-python

Bottlenecks

# omnibenchmark-python

Python modules

OOP: Object oriented programming

# omnibenchmark-python

Python modules

Pytest/monkeypatch

# Test manage_renku_plan
def test_manage_renku_plan_with_correct_plan(mock_plan, monkeypatch):
    def return_mock_plan(*args, **kwargs):
        return mock_plan

    monkeypatch.setattr(
        wflow,
        "check_plan_exist",
        return_mock_plan,
    )
    mock_omni_plan = OmniPlan(plan=mock_plan)
    out_plan = omni.manage_renku_plan(
        out_files=["any", "random"],
        omni_plan=mock_omni_plan,
        command="not_to_run_command_str",
    )
    assert out_plan == mock_omni_plan
# omnibenchmark-python

Python modules

mypy

class ConfigDict(TypedDict, total=False):
    data: ConfigData
    script: Optional[str]
    interpreter: Optional[str]
    command_line: Optional[str]
    inputs: Optional[ConfigInput]
    outputs: Optional[ConfigOutput]
    parameter: Optional[ConfigParam]


def get_omni_object_from_yaml(yaml_file: PathLike) -> OmniObject:
    with open(yaml_file) as f:
        config = yaml.load(f, Loader=yaml.FullLoader)
    check_type("config", config, ConfigDict)
    return build_omni_object_from_config(config)
# Summary

Summary

omnibenchmark-python:
  • to automatize working with renku(-python)
  • to flexibly interact with omnibenchmark (and renku-client)

  • not to replace renku CLI

To-Dos:
  • omni CLI
  • integrate triplet store

  • add to pypy

Made with Slides.com