CWLProv

Towards Reproducible & Interoperable Scientific Workflows 

Farah Zaib Khan, Stian Soiland-Reyes, Richard O. Sinnott, Andrew Lonie, Carole Goble, Michael R. Crusoe

                                         

 

@farahzk03

https://slides.com/farahzkhan/cwlprov-8

 

 

Workflows

(Esp. scientific workflows)

"The description of a process for accomplishing a scientific objective, usually expressed in terms of tasks and their dependencies"

(Ludäscher et al. 2009)

 

Simple explanation: 

Automated mechanism to systematise a computational analysis and capture the methods utilised

@farahzk03

https://slides.com/farahzkhan/cwlprov-8

@farahzk03

https://slides.com/farahzkhan/cwlprov

Provenance

Information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its

Quality

Reliability

Trustworthiness

@farahzk03

https://slides.com/farahzkhan/cwlprov-8

Who did this? 

When did this happen?

Using what? 

Provenance Applications

Attribution

Quality Assurance

Verification of Results

Debugging in case of failure/error

Reproducibility

@farahzk03

https://slides.com/farahzkhan/cwlprov-8

Expectation?

With published results, methods are shared and the verification can be done by reproducing the workflow anywhere

Reality?

Different computing environment; Heterogeneous workflow management systems (WMS); Incomplete Provenance; Limited access to data; Proprietary Software ...  

@farahzk03

https://slides.com/farahzkhan/cwlprov-8

Short Answer:

In many many different ways!!!!

Long Answer: 

How are workflows designed and enacted?

@farahzk03

https://slides.com/farahzkhan/cwlprov-8

The full "incomplete" list contains 231 entries..

And then we have many such lists.

@farahzk03

https://slides.com/farahzkhan/cwlprov-8

What's wrong with having so many approaches??

Lack of interoperability;

different requirements for reproducibility;

vendor lock-in;

provenance information --> often incomplete

@farahzk03

https://slides.com/farahzkhan/cwlprov-8

We define Provenance Framework

@farahzk03

https://slides.com/farahzkhan/cwlprov-8

People will never agree on one system ... many reasons why not!!

 

 

 

 

Abstraction is the key to interoperability

 

 

@farahzk03

https://slides.com/farahzkhan/cwlprov-8

Provenance using PROV-Model, wfprov and wfdesc ontology.

Workflow specifications

@farahzk03

https://slides.com/farahzkhan/cwlprov-8

 

https://www.commonwl.org/

Why these standards?? 

Interoperable

Open source

Domain neutral

Community driven

@farahzk03

https://slides.com/farahzkhan/cwlprov-8

  • Common Workflow Language
  • Research object
  • PROV
  • BagIt
  • wfdesc, wfprov

CWLProv

Format for the representation of a CWL workflow run and its retrospective provenance

Keeping in view the best practices and defined levels 

CWLProv Research Object

Adapted from:

https://doi.org/10.5281/zenodo.1484286#page=8

Mechanism for serialization and transport consistency

 effective "sharing" of your analysis within/outside your lab

"publishing" standardized methods along with publication/manuscript submission

Don't care about sharing (you should)? Think about analyzing/re-using your own workflow few months later.. 

Collaborators

Peers

Community

future you

Reviewer

Editor

Reader/end-user

Who is happy ?

What can you achieve?

@farahzk03

https://slides.com/farahzkhan/cwlprov-8

Made with Slides.com