CWLProv
Towards Reproducible & Interoperable Scientific Workflows
Farah Zaib Khan, Stian Soiland-Reyes, Richard O. Sinnott, Andrew Lonie, Carole Goble, Michael R. Crusoe
This work is licensed under
Workflows
(Esp. scientific workflows)
"The description of a process for accomplishing a scientific
(Ludäscher et al. 2009)
Simple explanation:
https://slides.com/farahzkhan/cwlprov-8
https://slides.com/farahzkhan/cwlprov
Provenance
Information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its
Quality
Reliability
Trustworthiness
https://slides.com/farahzkhan/cwlprov-8
Who did this?
When did this happen?
Using what?
Provenance Applications
Attribution
Quality Assurance
Verification of Results
Debugging in case of failure/error
Reproducibility
https://slides.com/farahzkhan/cwlprov-8
Expectation?
With published results, methods are shared and the verification can be done by reproducing the workflow anywhere
Reality?
Different computing environment; Heterogeneous workflow management systems (WMS); Incomplete Provenance; Limited access to data; Proprietary Software ...
https://slides.com/farahzkhan/cwlprov-8
Short Answer:
In many many different ways!!!!
Long Answer:
How are workflows designed and enacted?
https://slides.com/farahzkhan/cwlprov-8
The full "incomplete" list contains 231 entries..
And then we have many such lists.
https://slides.com/farahzkhan/cwlprov-8
What's wrong with having so many approaches??
Lack of interoperability;
different requirements for reproducibility;
vendor lock-in;
provenance information --> often incomplete
https://slides.com/farahzkhan/cwlprov-8
We define Provenance Framework
https://slides.com/farahzkhan/cwlprov-8
People will never agree on one system ... many reasons why not!!
Abstraction is the key to interoperability
https://slides.com/farahzkhan/cwlprov-8
Provenance using PROV-Model, wfprov and wfdesc ontology.
Workflow specifications
https://www.commonwl.org/
Why these standards??
Interoperable
Open source
Domain neutral
Community driven
https://slides.com/farahzkhan/cwlprov-8
- Common Workflow Language
- Research object
- PROV
- BagIt
- wfdesc, wfprov
CWLProv
Format for the representation of a CWL workflow run and its retrospective provenance
Keeping in view the best practices and defined levels
CWLProv Research Object
Adapted from:
https://doi.org/10.5281/zenodo.1484286#page=8
Mechanism for serialization and transport consistency
effective "sharing" of your analysis within/outside your lab
"publishing" standardized methods along with publication/manuscript submission
Don't care about sharing (you should)? Think about analyzing/re-using your own workflow few months later..
Collaborators
Peers
Community
future you
Reviewer
Editor
Reader/end-user
Who is happy ?
What can you achieve?
https://slides.com/farahzkhan/cwlprov-8
CWLProv-melbourne
By Farah Z Khan
CWLProv-melbourne
CWLProv: Towards Reproducible & Interoperable Scientific Workflows
- 895