Stian Soiland-Reyes
eScience lab, The University of Manchester
INDElab, University of Amsterdam
International FAIR Convergence Symposium
(FAIR Data Provenance)
2020-02-02
This work is licensed under a
Creative Commons Attribution 4.0 International License.
They ride with what I refer to as the four horsemen of the reproducibility apocalypse:
State of art in reproducibility
Reproducibility?
Capturing workflow provenance in a research object
CWLProv explained by example:
Transfer: BagIt
Manifest: ORE/RO JSON-LD
Workflow description: wfdesc (Turtle)
Workflow run (PROV +wfprov)
Workflow definition: CWL
Tool interoperability: Docker
Data: Content-adressable files
Semantic Web world vs Real World
Peter Sefton at Open Repositories 2019
https://eresearch.uts.edu.au/2019/07/01/DataCrate-OR2019.htm
16k RO-Crates underneath the hood:
IEEE2791-2020
RO-Crate as an index
ro-crate-metadata.json
{
"@id": "#DataCapture_wcc02",
"@type": "CreateAction",
"agent": {
"@id": "https://orcid.org/0000-0002-1672-552X"
},
"instrument": {
"@id": "https://confluence.csiro.au/display/ASL/Hovermap"
},
"object": {
"@id": "#victoria_arch"
},
"result": [
{
"@id": "wcc02_arch.laz"
},
{
"@id": "wcc02_arch_traj.txt"
}
]
},
{
"@id": "#victoria_arch",
"@type": "Place",
"address": "Wombeyan Caves, NSW 2580",
"name": "Victoria Arch"
}
{"@context": "https://w3id.org/ro/crate/1.1/context",
"@graph" : [
{
"@id": "#Photo_Capture_1",
"@type": "CreateAction",
"agent": {
"@id": "https://orcid.org/0000-0002-3545-944X"
},
"description": "Photo snapped on a photo walk on a misty day",
"endTime": "2017-06-11T12:56:14+10:00",
"instrument": [
{
"@id": "#EPL1"
},
{
"@id": "#Panny20mm"
}
],
"result": {
"@id": "pics/2017-06-11%2012.56.14.jpg"
}
},
{
"@id": "#SepiaConversion_1",
"@type": "CreateAction",
"name": "Convert dog image to sepia",
"description": "convert -sepia-tone 80% test_data/sample/pics/2017-06-11\\ 12.56.14.jpg test_data/sample/pics/sepia_fence.jpg",
"endTime": "2018-09-19T17:01:07+10:00",
"instrument": {
"@id": "https://www.imagemagick.org/"
},
"object": {
"@id": "pics/2017-06-11%2012.56.14.jpg"
},
"result": {
"@id": "pics/sepia_fence.jpg"
}
},
{
"@id": "https://www.imagemagick.org/",
"@type": "SoftwareApplication",
"url": "https://www.imagemagick.org/",
"name": "ImageMagick",
"version": "ImageMagick 6.9.7-4 Q16 x86_64 20170114 http://www.imagemagick.org"
}
]
}
{
"tmpformat": "ro/workflow/test-metadata/0.1",
"@id": "test-metadata.json",
"test": [
{
"name": "dtests",
"instance": [
{
"name": "dtests",
"service": {
"type": "jenkins",
"url": "http://172.30.10.90:8080/",
"resource": "job/dtests/"
}
}
],
"definition": {
"test_engine": {
"type": "planemo",
"version": ">=0.70"
},
"path": "path relative to the directory containing this file"
}
}
]
}
Workflow language & version
Workflow engine & version (e.g. Toil)
Workflow definition
Input data (or pointers to such)
Parameters? What can be implicit and explicit? (see BCO?)
Tool Dependencies to install (mostly implied by CWL/Nextflow/Galaxy, but might need versions/repos)
Container platform requirement [e.g. Docker, Conda]
Operating system requirement
Hardware requirements (memory, CPU, GPU)
Equivalent of AWS cloud instance type sufficient?
Where to run/submit (e.g. usegalaxy.eu)
Explicit/resolved container IDs
Archive containers from Docker Hub (protect against image expiration)
...
Join discussion in the
Workflow Hub Club community!
https://about.workflowhub.eu/
--> Separation of concern
Next call: Thu 7 Jan 2021 20:00 UTC