Overview of the Trusted Workflow Run Crate profile

Stian Soiland-Reyes,
The University of Manchester

Approaching 5-safe with RO-Crate

RO-Crate

FAIR packaging of data, metadata and methods

Trusted Workflow Run Crate

Safe Data

  • How to identify sensitive source data within the TRE?
    TRE-specific identifiers not necessarily globally unique

  • Is each workflow TRE-customized for accessing a particular data source (e.g. database)?
    Inputs may be implicit; workflow not portable across TREs

  • Do the users know in advance what file paths the source data will have within the TRE?
    Identifiers may be mapped before execution
    May need to inject database connection credentials

  • What restrictions apply before permitting user-provided input data and parameters?
    TREs may review workflow and how it is being called, e.g. inspect "query" input

Questions on Safe Data

Safe People

..

  • Is it unreasonable to expect a global ORCID identifier?
  • How can TREs link a global identifier to their local user identifiers?
  • Should the crate also include the local user identifier?
  • Are there GDPR concerns with disclosing the researcher names?

Questions on Safe People

Safe Projects

  • Is a TRE-specific identifier string sufficient to evaluate safe project?
    TRE need to verify submitter is actually member of said project
  • Should the Agreement Policy be injected/explicit?
    Made public or on Intranet?
  • How can we map projects across multiple TREs?
  • Who provides the grant information?
    A grant is likely larger than a single TRE project;
    need consistent grant identifiers.

Questions on Safe Projects

Safe Settings

  • How do reviewers analyse the workflow?
  • Can a workflow run in more than one TRE?
  • Can a workflow execute outside an TRE (e.g. using synthetic data)
  • What workflow systems need to be supported?
  • Can the workflow execute without needing using interactions?
  • Can the tools of the workflow run as command line tools from containers?
  • What TRE restrictions may prevent workflow executions?

Questions on Safe Settings

Safe Outputs

..

  • How do reviewers view/analyse if the output data can be disclosed?
  • Can some outputs be made sensitive and propagate to another TRE?
  • What file formats are used for current data outputs?
  • What are the file sizes involved in output data? Many files or large files?

Questions on Safe Output

Review process

Questions on review process:

  • Which phases should be done manually?
  • Are there phases missing? Loops?
  • Which phases might become optional?
  • What UI is needed for the review process?
  • Where to store crates that are "in flight"?
    Current prototype use queues