Farah Zaib Khan, Stian Soiland-Reyes, Michael R. Crusoe, Richard O. Sinnott, Andrew Lonie
This work is licensed under
https://slides.com/farahzkhan/cwlprov
https://slides.com/farahzkhan/cwlprov
"The description of a process for accomplishing a scientific objective, usually expressed in terms of tasks and their dependencies"
(Ludäscher et al. 2009)
https://slides.com/farahzkhan/cwlprov
https://slides.com/farahzkhan/cwlprov
https://slides.com/farahzkhan/cwlprov
Cpipe
https://slides.com/farahzkhan/cwlprov
https://slides.com/farahzkhan/cwlprov
https://slides.com/farahzkhan/cwlprov
https://slides.com/farahzkhan/cwlprov
https://slides.com/farahzkhan/cwlprov
https://slides.com/farahzkhan/cwlprov
Retrospective provenance
Formally: The detailed record of the implementation of a computational task including details of every executed process together with comprehensive information about the execution environment used to derive a specific data product.
Not so boring: All the details associated with a given workflow run
(hang on, next slide has a list of "all the details")
Prospective provenance
The ‘recipes’ used to execute a computational task, e.g. the workflow specification.
https://slides.com/farahzkhan/cwlprov
https://slides.com/farahzkhan/cwlprov
https://slides.com/farahzkhan/cwlprov
https://slides.com/farahzkhan/cwlprov
https://slides.com/farahzkhan/cwlprov
Format for the representation of a CWL workflow run and its retrospective provenance
https://slides.com/farahzkhan/cwlprov
Provenance using PROV-Model, wfprov and wfdesc ontology.
Workflow specifications
https://slides.com/farahzkhan/cwlprov
https://slides.com/farahzkhan/cwlprov
https://slides.com/farahzkhan/cwlprov
https://slides.com/farahzkhan/cwlprov
Step one:
Choose a feature complete reference implementation of CWL
Obvious choice:
https://slides.com/farahzkhan/cwlprov
cwltool --provenance ROname workflow.cwl job.json
wasGeneratedBy(data:77eecc82607c19910a0f19f55f2c7d5bf7291680, id:da61ed64-6594-4997-981e-9c292366766c,
2018-06-05T15:33:58.343902, [prov:role='ex:main/create-tar/tar'])
"wasGeneratedBy": {
"_:id6": {
"prov:entity": "data:77eecc82607c19910a0f19f55f2c7d5bf7291680",
"prov:activity": "id:da61ed64-6594-4997-981e-9c292366766c",
"prov:time": "2018-06-05T15:33:58.343902",
"prov:role": {
"$": "ex:main/create-tar/tar",
"type": "prov:QUALIFIED_NAME"
}
},
effective "sharing" of your analysis within/outside your lab
"publishing" standardized methods along with publication/manuscript submission
Don't care about sharing (you should)? Think about analyzing/re-using your own workflow few months later..
Collaborators
Peers
Community
future you
Reviewer
Editor
Reader/end-user
Who is happy ?
What can you achieve?
Attribution
Accreditation
Quality Assurance
Verification
Debugging
Reproducibility
https://slides.com/farahzkhan/cwlprov
https://slides.com/farahzkhan/cwlprov
Sadly No, Like every project, we are also in iterative process of continuous improvement and updates...
Is everything perfect? All problems solved? Unicorns exist?
Big shout out to Nextflow:
Questions?
We will be here for 4 days of Co-fest, feel free to come have a chat