Capturing workflow life cycle

with RO-Crate

Stian Soiland-Reyes

eScience lab, The University of Manchester

INDElab, University of Amsterdam

ELIXIR All Hands 2021
Workshop: Workflow Life Cycle
2021-06-11

H2020-INFRAEOSC-2018-2 824087

H2020-INFRAEDI-2018-1 823830

H2020-INFRAIA-2017-1 730976

H2020-INFRADEV-2019-2 871118

H2020-INFRAIA-2018-1 823827

What is RO-Crate?

RO-Crate is method for describing a dataset as a digital object using a single linked-data metadata document

Credit: Peter Sefton
Adapted from https://arkisto-platform.github.io/standards/ro-crate/

The dataset may contain any kind of
data resource, about anything, in any format
as a file or URL

Credit: Peter Sefton
Adapted from https://arkisto-platform.github.io/standards/ro-crate/

Each resource can have a machine readable description in JSON-LD format

Credit: Peter Sefton
Adapted from https://arkisto-platform.github.io/standards/ro-crate/

A human-readable description/preview can be in an HTML file that lives alongside the metadata

Credit: Peter Sefton
Adapted from https://arkisto-platform.github.io/standards/ro-crate/

Provenance and workflow information can be included
– to assist in re-use of data and research processes

Credit: Peter Sefton
Adapted from https://arkisto-platform.github.io/standards/ro-crate/

RO-Crate Digital Objects may be packaged for distribution eg via Zip, Bagit and OCFL
– or simply be published on the Web

Credit: Peter Sefton
Adapted from https://arkisto-platform.github.io/standards/ro-crate/

RO-Crate for
research outputs

RO-Crate for data scientists

Credit: Marco La Rosa, Peter Sefton

RO-Crate for
data management plans

Credit: Tomasz Miksa et al
https://doi.org/10.4126/FRL01-006423291

Machine-actionable Data Management Plans

Use case #1: From exemplar RO-Crate generate maDMP

Use case #2: From maDMP generate template RO-Crate

RO-Crate for repositories

RO-Crate as a archival format for repositories

Metadata held alongside hetereogeneous data

Exchange mechanism (import/export)

Avoid vendor lock-in

RO-Crate for
workflow descriptions

Describing workflows with RO-Crate

RO-Crate for
workflow test specifications

Workflow Testing RO-Crate

  • Workflow definition (e.g. Galaxy, Snakemake)
  • Test suite: Instances of Test definitions
  • Binds to particular test engines, e.g. Planemo, Jenkins

RO-Crate for
workflow run provenance

RO-Crate minimal provenance: Some software was used

ISO 23494: Biotechnology - Provenance Information Model for Biological Specimen and Data

CWLProv

Credit: Thanasis Vergoulis

https://doi.org/10.5281/zenodo.4671709

Executing Workflow RO-Crates

Credit: José Mª Fernández, ELIXIR All Hands, 2021-06-11

RO-Crate for
computational tools

Making Canonical Workflow Building Blocks interoperable across workflow languages

RO-Crate for
data citations

4-dimensional RO-Crates?

Credit: Oscar Corcho, Carole Goble
https://doi.org/10.5281/zenodo.4913285

RO-Crate profiles

RO-Crate profile for
FAIR Digital Objects

RO-Crate profile for
RO-Crate profiles...?

RO-Crate Community

The RO-Crate Community is open for anyone to join us!
researchobject/ro-crate#1

2021-06-11 Capturing workflow life cycle with RO-Crate

By Stian Soiland-Reyes

2021-06-11 Capturing workflow life cycle with RO-Crate

Template of 2021 presentations

  • 1,630