Provenance with the PROV standard

DATA71011 Understanding Data and their Environment

 

Stian Soiland-Reyes

Intended Learning Outcomes

  1. Understanding why and when to use provenance standards
  2. Ability to distinguish PROV concepts
  3. Knowledge of considerations for modelling choices 
  4. Ability to write a machine-readable metadata language
  5. Skill of modelling processes in a formal language (in lab, assessment)

Motivation

Why using a standard for

machine-readable provenance?

Motivation for W3C PROV Standards

PROV standards enhance data interoperability, transparency, and reproducibility across various domains.

Examples in Academia and Industry

  • Research data management in universities.
  • Supply chain tracking in manufacturing.
  • Healthcare data lineage.

Fast Healthcare Interoperability Resources
PROV as part of standard for
health care data exchange

https://hl7.org/fhir/provenance.html

IBM ProvLake

Recording machine learning provenance

https://research.ibm.com/projects/provlake

IBM ProvLake

Recording machine learning provenance

https://research.ibm.com/projects/provlake

Consuming PROV Documents in Python

Utilize libraries like prov and rdflib to parse and manipulate PROV data programmatically.

Combining PROV Traces

Integrate multiple PROV documents to create comprehensive provenance records for complex datasets.

Benefits of Using PROV

Improves data trustworthiness, enhances collaboration, and supports reproducible research practices.

DATA71011 PROV

By Stian Soiland-Reyes

DATA71011 PROV

Provenance with the PROV standard. Lecture in DATA71011 Understanding Data and their Environment at The University of Manchester.

  • 7