Facing FAIR challenges
with RO-Crate

Stian Soiland-Reyes

eScience lab, The University of Manchester

INDElab, University of Amsterdam

FAIR Evaluation stakeholder meeting

GO-FAIR
2022-02-10

H2020-INFRAEOSC-2018-2 824087

H2020-INFRAEDI-2018-1 823830

H2020-INFRAIA-2017-1 730976

H2020-INFRADEV-2019-2 871118

H2020-INFRAIA-2018-1 823827

What is RO-Crate?

Suggested new FAIR principle:

A3+ Metadata is not just machine-readable!

One of the grand challenges of data-intensive science, therefore, is to improve knowledge discovery through assisting both humans, and their computational agents, in the discovery of, access to, and integration and analysis of, task-appropriate scientific data and other scholarly digital objects.

https://doi.org/10.1038/sdata.2016.18

Importance of profiles

Brian: Look, you've got it all wrong! You don't need to follow me. You don't need to follow anybody! You've got to think for yourselves! You're all individuals!

Crowd: Yes! We're all individuals!

Brian: You're all different!

Crowd: Yes, we are all different!

Man in crowd: I'm not...

Monty Python's Life of Brian (1979)

Credit: Carole Goble
Dataverse Community Meeting 2021

https://www.slideshare.net/carolegoble/

Do we need to
formalize a profile?

Creating RO-Crates from
FAIR data sources

{
    "@context": "https://w3id.org/ro/crate/1.1/context",
    "@graph": [
        {
            "@id": "ro-crate-metadata.json",
            "@type": "CreativeWork",
            "about": {
                "@id": "./"
            },
            "conformsTo": {
                "@id": "https://w3id.org/ro/crate/1.1"
            },
            "description": "RO-Crate Metadata File Descriptor (this file)"
        },
        {
            "@id": "./",
            "@type": "Dataset",
            "DataType": "project",
            "author": [
                {
                    "@id": "https://orcid.org/0000-0003-0212-3381"
                },
                {
                    "@id": "#sara_brinquis_fava"
                }
            ],
            "description": "Determine if the work of the agricultural extension services are key in the search for development in the territories of Salvador and Guatemala; and from their learnings, recommendations for their potential replicability.",
            "hasPart": [
                {
                    "@id": "#bibliography"
                },
                {
                    "@id": "#anonymous_interview"
                },
                {
                    "@id": "#interview_questions"
                },
                {
                    "@id": "index-en.html"
                }
            ],
            "name": "Good Agricultural Practices and Damage and Loss Assessment for the Comprehensive Management of Disaster Risk and Climate-Adapted Sustainable Agriculture in the Dry Corridor of Central America."
        },
        {
            "@id": "https://orcid.org/0000-0003-0212-3381",
            "@type": "Person",
            "description": null,
            "name": "MARGARITA RUIZ RAMOS",
            "position": [
                "Universidad Polit\u00e9cnica de Madrid - Technical University of Madrid"
            ]
        },
        {
            "@id": "#sara_brinquis_fava",
            "@type": "Person",
            "description": "2nd year student in the Enabling Master's Degree in Agronomic Engineering at the Polytechnic University of Madrid",
            "name": "Sara Brinquis Fava",
            "position": [
                "Research Assistant at Universidad Polit\u00e9cnica de Madrid"
            ]
        },
        {
            "@id": "#bibliography",
            "@type": "Dataset",
            "description": "Project documentation",
            "distribution": {
                "@id": "https://drive.google.com/drive/folders/1S-JNn_FYfYJHnya_rIsQpCT9j_AinuzV"
            },
            "name": "Bibliography"
        },
        {
            "@id": "#anonymous_interview",
            "@type": "Dataset",
            "description": "Anonymous interview transcript",
            "distribution": {
                "@id": "https://drive.google.com/drive/folders/1LVmRdxz7xzL5xRsk05x14s3ilVR4HGsG"
            },
            "name": "Anonymous interview"
        },
        {
            "@id": "#interview_questions",
            "@type": "Dataset",
            "description": "Interview questions",
            "distribution": {
                "@id": "https://drive.google.com/drive/folders/1I_KPjAKJ1zKPWqj2VVWj4uRLOxH2DCLk"
            },
            "name": "Interview questions"
        },
        {
            "workExample": [
                {
                    "@type": "Demo",
                    "description": "Anonymous interview transcript.",
                    "link": "https://drive.google.com/drive/folders/1xKIWHDgsBjiRalMb5BM-pDHEbslLYJpr",
                    "name": null
                },
                {
                    "@type": "Demo",
                    "description": "Interview questions.",
                    "link": "https://drive.google.com/drive/folders/1I_KPjAKJ1zKPWqj2VVWj4uRLOxH2DCLk",
                    "name": null
                }
            ]
        },
        {
            "@id": "index-en.html",
            "@type": "WebPage",
            "name": "HTML representation of this data."
        }
    ]
}

How should we resolve an
RO-Crates from a DOI?

Accept: text/html

Resolving an RO-Crate with content-negotiation

ComputationalWorkflow
Accept: text/html
Accept: application/zip

Resolving an RO-Crate with content-negotiation

Accept: text/html
Accept: application/ld+json;
  profile=https://w3id.org/ro/crate
Accept: application/zip

Resolving an RO-Crate with content-negotiation

Accept: application/ld+json;
  profile=https://w3id.org/ro/crate

Downside: Indirection to find core metadata and content

author
@type
ComputationalWorkflow
license
hasPart
isBasedOn

Parse JSON, find the right node

HEAD https://workflowhub.eu/workflows/29?version=2

200 OK
Link: <https://doi.org/10.48546/workflowhub.workflow.29.2>;rel=cite-as
Link: <https://workflowhub.eu/workflows/29/ro_crate?version=2>;rel=describedby
Link: <https://orcid.org/0000-0003-0513-0288>;rel=author
…


 

Resolving an RO-Crate with FAIR Signposting

rel=author
rel=type
ComputationalWorkflow
rel=license
rel=item
rel=describedby;
type="application/ld+json;profile=https://w3id.org/ro/crate"
rel=cite-as
rel=item;
type="application/zip"
GET https://workflowhub.eu/workflows/29?version=2

200 OK
Content-Type: text/html

<html>
…
 <script type="application/ld+json">{
  "@context": "https://schema.org",
  "@id": "https://workflowhub.eu/workflows/29?version=2",
  "@type": ["File","SoftwareSourceCode","ComputationalWorkflow"],
  "license": "https://opensource.org/licenses/Apache-2.0",
  "creator": [
    {
      "@type": "Person",
      "@id": "https://workflowhub.eu/people/47",
      "name": "Genís Bayarri"
    }
  ],
  "dct:conformsTo": "https://bioschemas.org/profiles/ComputationalWorkflow/1.0-RELEASE/",

 

Workaround: Convert RO-Crate to Bioschemas

ComputationalWorkflow

Final remarks

  • Documenting conventions as Profiles
    are useful to predict what information is available
  • Reducing choice of formats and vocabularies
    increase interoperability and usability
  • FAIR must be achievable by individuals using off the shelf tools
    --> Should not be restricted by choices of big platforms
  • PIDs are nice, but resolving them is still tricky for machines
    --> Need FAIR Signposting / FDO conventions
  • Linked Data is nice, but we shouldn't always have to follow the links
    --> Consuming FAIR data to create enriched metadata
  • Structure data "just enough" for known use cases
    --> Future You is probably your most important consumer!

 

RO-Crate Community

 

The RO-Crate Community is open for anyone to join us!
https://www.researchobject.org/ro-crate/community.html