Capturing Just Enough Data, Software and Metadata
with RO-Crate


Stian Soiland-Reyes

eScience lab, The University of Manchester

INDElab, University of Amsterdam

Dataverse community meeting 2021
Software Metadata and Containerization

H2020-INFRAEOSC-2018-2 824087

H2020-INFRAEDI-2018-1 823830

H2020-INFRAIA-2017-1 730976

H2020-INFRADEV-2019-2 871118

H2020-INFRAIA-2018-1 823827

What is RO-Crate?

RO-Crate is method for describing a dataset as a digital object using a single linked-data metadata document

Credit: Peter Sefton
Adapted from

The dataset may contain any kind of
data resource, about anything, in any format
as a file or URL

Credit: Peter Sefton
Adapted from

Each resource can have a machine readable description in JSON-LD format

Credit: Peter Sefton
Adapted from

A human-readable description/preview can be in an HTML file that lives alongside the metadata

Credit: Peter Sefton
Adapted from

Provenance and workflow information can be included
– to assist in re-use of data and research processes

Credit: Peter Sefton
Adapted from

RO-Crate Digital Objects may be packaged for distribution eg via Zip, Bagit and OCFL
– or simply be published on the Web

Credit: Peter Sefton
Adapted from

RO-Crate for
research outputs

RO-Crate for data scientists

Credit: Marco La Rosa, Peter Sefton

RO-Crate for digital humanities

Capturing cultural heritage records
as RO-Crates

RO-Crate for
data management plans

Credit: Tomasz Miksa et al

Machine-actionable Data Management Plans

Use case #1: From exemplar RO-Crate generate maDMP

Use case #2: From maDMP generate template RO-Crate

RO-Crate for repositories

RO-Crate as a archival format for repositories

Metadata held alongside hetereogeneous data

Exchange mechanism (import/export)

Avoid vendor lock-in

RO-Crate for
workflow descriptions

Describing workflows with RO-Crate

Credit: Thanasis Vergoulis

Executing Workflow RO-Crates

RO-Crate for
workflow test specifications

Workflow Testing RO-Crate

  • Workflow definition (e.g. Galaxy, Snakemake)
  • Test suite: Instances of Test definitions
  • Binds to particular test engines, e.g. Planemo, Jenkins

RO-Crate for
computational tools

Making Canonical Workflow Building Blocks interoperable across workflow languages

RO-Crate for
workflow run provenance

RO-Crate minimal provenance: Some software was used

Credit: José Mª Fernández, ELIXIR All Hands, 2021-06-11

RO-Crate for enabling a
large number of
data citations


RO-Crate as aggregation:
data citation reliquary

4-dimensional RO-Crates?

Credit: Oscar Corcho, Carole Goble

RO-Crate profiles

Credit: Carole Goble
Dataverse Community Meeting 2021

Vocabularies in RO-Crate

FAIR is not just machine-readable!

Techie deep-dive!

Warning: JSON ahead

RO-Crate model

RO-Crate in Dataverse?

Distribute metadata with data JSON-LD from DataVerse

95% RO-Crate

Where did that lovely metadata go..?

Where's the DOI?

How do we know it's from a Dataverse?

What is this dataset called?

Import data with metadata

Avoid re-filling metadata already captured

(e.g. from workflow system or Describo)


Move RO-Crate between repositories


Build RO-Crate early and incrementally -
not just at Dataverse deposit time


Self-publish RO-Crate (e.g. project website)
with Dataverse references & DOI

Store data with metadata

RO-Crate as
metadata storage


BagIt (RFC8493)

Oxford Common File Layout (OCFL)

RO-Crate for integration with

workflow systems & registries

RO-Crate for downloadable "data shopping carts"

RO-Crate Community

The RO-Crate Community is open for anyone to join us!

2021-06-17 Capturing Just Enough Data, Software and Metadata with RO-Crate

By Stian Soiland-Reyes

2021-06-17 Capturing Just Enough Data, Software and Metadata with RO-Crate

  • 1,710