DATA71011 Understanding Data and their Environment
Stian Soiland-Reyes
This work is licensed under a
Creative Commons Attribution 4.0 International License.
Why using a standard for
machine-readable provenance?
Standards enhance data interoperability, transparency, and reproducibility across various domains.
Existing tooling and guidance can be used directly.
(Meta)data can be moved between systems or combined
Try to use existing standards!
Examples from industry and academia
<?xml version="1.0" encoding="UTF-8"?>
<order:orderMessage
xmlns:order="urn:gs1:ecom:order:xsd:3"
xmlns:sh="http://www.unece.org/cefact/namespaces/StandardBusinessDocumentHeader"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:gs1:ecom:order:xsd:3 ../Schemas/gs1/ecom/Order.xsd">
<sh:StandardBusinessDocumentHeader>
<sh:HeaderVersion>1.0</sh:HeaderVersion>
<sh:Sender>
<!-- Retailer Information : Ex. SuperStore -->
<sh:Identifier Authority="GS1"/>
</sh:Sender>
<sh:Receiver>
<!-- Shipper information -->
<sh:Identifier Authority="GS1"/>
</sh:Receiver>
<sh:DocumentIdentification>
<sh:Standard>GS1</sh:Standard>
<sh:TypeVersion>3.3</sh:TypeVersion>
<sh:InstanceIdentifier>100002</sh:InstanceIdentifier>
<sh:Type>order</sh:Type>
<sh:MultipleType>false</sh:MultipleType>
<sh:CreationDateAndTime>2011-04-08T14:58:56.591Z</sh:CreationDateAndTime>
</sh:DocumentIdentification>
</sh:StandardBusinessDocumentHeader>
<!-- ********************************************************************************** -->
<!-- NOTE : Comments for a field appear AFTER the field -->
<!-- This is a Purchase Order, submitted from a Retailer to a Supplier.
Currently, retailer sends a copy ("shadows") of the PO, in this format to the Blockchain.
Every field here, unless indicated as OPTIONAL are Mandatory! -->
<!-- ********************************************************************************** -->
<order>
<creationDateTime>2011-04-08T14:58:56.591Z</creationDateTime>
<!-- MANDATORY: Purchase Order CreationDate And Time : Created by Retailer
UTC time (ISO 8601) when the PO was created. -->
<documentStatusCode>ORIGINAL</documentStatusCode>
<!-- Do not change. -->
<orderIdentification>
<entityIdentification>urn:epcglobal:cbv:bt:5412345000037:3352</entityIdentification>
<!--MANDATORY-->
<!-- urn:epcglobal:cbv:bt:ShipToGLN:PONumber -->
<!-- Retailer Purchase Order Number: The format for this is urn:epcglobal:cbv:bt:<gln>:<po-number>,
where <gln> is the "shipTo" GLN (following) AND <po-number> should NOT contain a ":" character.
Using this notation allows a supplier to put in a reference to this PO (using the same format)
from other EPCIS events and Business Txn documents.
Reference: https://www.gs1.org/sites/default/files/docs/epc/CBV-Standard-1-2-1-r-2017-05-05.pdf [Section 8.5.2] -->
<!-- IBM Blockchain Transparent Supply Transaction ID format: urn:ibm:ift:bt:<Company Prefix>.<Location Reference>.<Transaction Id>-->
<!-- where <Company Prefix>.<Location Reference> are for the "shipTo" location-->
</orderIdentification>
<orderTypeCode>220</orderTypeCode>
<!-- Code for buyer to order (220 is default).
For other codes, refer: http://apps.gs1.org/GDD/Pages/clDetails.aspx?semanticURN=urn:gs1:gdd:cl:OrderTypeCode&release=2 -->
<buyer>
<gln>5412345000013</gln>
<!-- MANDATORY: Retailer Corporate Identity GLN -->
</buyer>
<seller>
<gln>4098765000010</gln>
<!-- Seller Corporate Identity gln to be communicated from the seller (shipper) to the buyer (retailer).
MANDATORY for the buyer to provide visibility of the PO to the seller; OPTIONAL otherwise (seller will
not be able to see the PO) -->
</seller>
<!-- NOTE: <seller>, </seller> tags should not be omitted even if <gln> is omitted. -->
<orderLogisticalInformation>
<shipFrom>
<gln>4098765000010</gln>
<!-- OPTIONAL: Shipper Dispatch location GLN (factory). This is mandatory in the associated DA(s) -->
</shipFrom>
<shipTo>
<gln>5412345000037</gln>
<!-- MANDATORY: Retailer Receiving location gln (Distribution Centre) -->
</shipTo>
<orderLogisticalDateInformation>
<requestedDeliveryDateTime>
<date>2011-04-11</date>
<!-- MANDATORY: Requested Delivery date (ISO8601 i.e. yyyy-mm-dd) at Retailer Receiving location when the PO was created. -->
<time>10:32:56.321Z</time>
<!-- OPTIONAL: Requested Delivery time (ISO8601 i.e. hh:mm:ss.sssZ) . GS1 DateOptionalTime Type-->
</requestedDeliveryDateTime>
</orderLogisticalDateInformation>
</orderLogisticalInformation>
<!-- NOTE: We will also reflect the orderLogisticalInformation at a LineItem level for future/other retailers. -->
<referencedOrder>
<entityIdentification>urn:epcglobal:cbv:bt:5412345000037:PO4487</entityIdentification>
<!-- MANDATORY: Top-level referenced purchase order identifier -->
<!-- urn:epcglobal:cbv:bt:<gln>:<po-number> -->
<!-- IBM Blockchain Transparent Supply Transaction ID format: urn:ibm:ift:bt:<Company Prefix>.<Location Reference>.<Transaction Id>-->
<lineItemNumber>2</lineItemNumber>
<!-- OPTIONAL: Related line item number -->
<orderRelationship>RELATED</orderRelationship>
<!-- MANDATORY: Relationship between the purchase orders. -->
<!-- Must be one of code values from http://apps.gs1.org/GDD/Pages/clDetails.aspx?semanticURN=urn:gs1:gdd:cl:OrderRelationshipTypeCode -->
</referencedOrder>
<!-- OPTIONAL: Reference to a related purchase order. -->
<extension>
<isReturnOrder>true</isReturnOrder>
</extension>
<!-- OPTIONAL: "true" indicates the purchase order is a return order.-->
<orderLineItem>
<lineItemNumber>1</lineItemNumber>
<!-- MANDATORY: Numerical Sequential number for items in the PO -->
<requestedQuantity measurementUnitCode="EA">48</requestedQuantity>
<!-- MANDATORY: item requested/ordered Quantity by the Retailer with measurement Unit Attribute-->
<!-- Two or three-character codes from UN/CEFACT Recommendation 20.-->
<!-- Examples: EA (each), LBR (pound), CS (case), KGM (kilogram).-->
<!-- See https://www.unece.org/fileadmin/DAM/cefact/recommendations/rec20/rec20_rev3_Annex2e.pdf and-->
<!-- https://www.unece.org/fileadmin/DAM/cefact/recommendations/rec20/rec20_rev3_Annex3e.pdf.-->
<itemPriceBaseQuantity measurementUnitCode="KGM">48</itemPriceBaseQuantity>
<!-- OPTIONAL: item requested/ordered price base quantity with measurement Unit Attribute. -->
<transactionalTradeItem>
<gtin>40987650000223</gtin>
<!-- MANDATORY: GS1-14 representation of item ordered by the Retailer-->
</transactionalTradeItem>
</orderLineItem>
<orderLineItem>
<lineItemNumber>2</lineItemNumber>
<!-- Numerical Sequential number for next item in the PO -->
<requestedQuantity measurementUnitCode="EA">24</requestedQuantity>
<!-- item requested/ordered Quantity by the Retailer with measurement Unit Attribute-->
<transactionalTradeItem>
<gtin>40987650000346</gtin>
<!-- GS1-14 representation of item ordered by the Retailer-->
</transactionalTradeItem>
<referencedOrder>
<entityIdentification>urn:epcglobal:cbv:bt:5412345000037:PO4488</entityIdentification>
<!-- MANDATORY: Line-level referenced purchase order identifier -->
<!-- urn:epcglobal:cbv:bt:<gln>:<po-number> -->
<!-- IBM Blockchain Transparent Supply Transaction ID format: urn:ibm:ift:bt:<Company Prefix>.<Location Reference>.<Transaction Id>-->
<lineItemNumber>2</lineItemNumber>
<!-- OPTIONAL: Related line item number -->
<orderRelationship>RELATED</orderRelationship>
<!-- MANDATORY: Relationship between the purchase orders. -->
<!-- Must be one of code values from http://apps.gs1.org/GDD/Pages/clDetails.aspx?semanticURN=urn:gs1:gdd:cl:OrderRelationshipTypeCode -->
</referencedOrder>
<!-- OPTIONAL: Reference to a related purchase order. Overrides top-level referencedOrder for line item if it exists. -->
<returnReasonCode>27</returnReasonCode>
<!-- OPTIONAL: The reason code for returning items. -->
<!-- Must be one of code values from http://www.unece.org/fileadmin/DAM/trade/untdid/d18a/tred/tred7007.htm -->
<extension>
<epcList>
<epc>urn:epc:id:sgtin:0614141.107346.2017</epc>
<epc>urn:epc:id:sgtin:0614141.107346.2018</epc>
</epcList>
<!--OPTIONAL: List of instance-level objects (SSCC, SGTIN) expected to be returned.-->
<quantityList>
<quantityElement>
<epcClass>urn:epc:class:lgtin:0614141.107346.101</epcClass>
<!--MANDATORY for quantityElement. Class-level EPCs like LGTINs. -->
<quantity>10</quantity>
<!--OPTIONAL for quantityElement.-->
<!--Meaning: 10 cases of LGTIN '0614141.107346' belonging to lot '101'-->
<uom>CS</uom>
<!--OPTIONAL for quantityElement. Item quantity unit of measurement. "CS" = Case.-->
<!--Two or three-charater codes from UN/CEFACT Recommendation 20.-->
<!--Other examples: EA (each), LBR (pound), KGM (kilogram).-->
<!--See https://www.unece.org/fileadmin/DAM/cefact/recommendations/rec20/rec20_rev3_Annex2e.pdf and-->
<!--https://www.unece.org/fileadmin/DAM/cefact/recommendations/rec20/rec20_rev3_Annex3e.pdf.-->
</quantityElement>
<quantityElement>
<epcClass>urn:epc:class:lgtin:0614141.107346.102</epcClass>
<quantity>20</quantity>
<uom>CS</uom>
</quantityElement>
</quantityList>
<!--OPTIONAL: List of class-level objects like LGTINS (GTIN+lot) which are expected to be returned.-->
</extension>
<!-- OPTIONAL: extension for return order details-->
</orderLineItem>
</order>
</order:orderMessage>
Using XML as a standard
Syntax is interoperable
..effectively make a custom domain model
--> not interoperable
Distributed provenance chains
Wittner et al. (2022) https://doi.org/10.1038/s41597-022-01537-6
Fast Healthcare Interoperability Resources
PROV as part of standard for
health care data exchange
PROV is a conceptual model with several machine-readable formats (PROV-O, PROV-N, PROV-JSON etc)
import prov.model as prov
import datetime
document = prov.ProvDocument()
document.set_default_namespace('http://anotherexample.org/')
document.add_namespace('ex', 'http://example.org/')
e2 = document.entity('e2', (
(prov.PROV_TYPE, "File"),
('ex:path', "/shared/crime.txt"),
('ex:creator', "Alice"),
('ex:content', "There was a lot of crime in London last month"),
))
a1 = document.activity('a1', datetime.datetime.now(), None, {prov.PROV_TYPE: "edit"})
document.wasGeneratedBy(e2, a1, None, {'ex:fct': "save"})
document.wasAssociatedWith('a1', 'ag2', None, None, {prov.PROV_ROLE: "author"})
document.agent('ag2', {prov.PROV_TYPE: 'prov:Person', 'ex:name': "Bob"})
document.get_provn()
document
default <http://anotherexample.org/>
prefix ex <http://example.org/>
entity(e2, [prov:type="File", ex:creator="Alice",
ex:content="There was a lot of crime in London last month",
ex:path="/shared/crime.txt"])
activity(a1, 2014-07-09T16:39:38.795839, -, [prov:type="edit"])
wasGeneratedBy(e2, a1, -, [ex:fct="save"])
wasAssociatedWith(a1, ag2, -, [prov:role="author"])
agent(ag2, [prov:type="prov:Person", ex:name="Bob"])
endDocument
document
prefix ex <http://example.com/back-to-the-future/>
entity(ex:results)
entity(ex:data)
entity(ex:interviews)
wasDerivedFrom(ex:results, ex:data)
wasDerivedFrom(ex:data, ex:interviews)
wasDerivedFrom(ex:interviews, ex:results)
endDocument
$ provconvert -infile test.provn -outfile test.svg
Responsibility view – who was attributed for what?
Entity → Agent
Activity → Agent
Data flow view – how did the information move from one piece of data to another?
Entity → Entity
Process view – what activity consumed/produced the data?
Activity ↔ Entity
Week 4 To Do List before practical (see Canvas)