Stian Soiland-Reyes, Farah Zaib Khan, Richard O. Sinnott, Andrew Lonie, Michael R Crusoe, Carole Goble


@soilandreyes

https://orcid.org/0000-0001-9842-9718

https://slides.com/soilandreyes/

Workshop for Research Objects (RO2018),
IEEE eScience 2008, Amsterdam
2018-10-29

This work has been done as part of the BioExcel CoE (www.bioexcel.eu), a project funded by the European Union contract H2020-EINFRA-2015-1-675728.

Capturing interoperable reproducible workflows with Common Workflow Language

 Three broad categories

Cpipe

@farahzk03

https://slides.com/farahzkhan/cwlprov

cwlVersion: v1.0
class: Workflow
inputs:
  inp: File
  ex: string

outputs:
  classout:
    type: File
    outputSource: compile/classfile

steps:
  untar:
    run: tar-param.cwl
    in:
      tarfile: inp
      extractfile: ex
    out: [example_out]

  compile:
    run: arguments.cwl
    in:
      src: untar/example_out
    out: [classfile]

Over 5000 CWL Descriptions on GitHub

 

 

#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: Workflow
inputs:
  inp: File
  ex: string

outputs:
  classout:
    type: File
    outputSource: compile/classfile

steps:
  untar:
    run: tar-param.cwl
    in:
      tarfile: inp
      extractfile: ex
    out: [example_out]

  compile:
    run: arguments.cwl
    in:
      src: untar/example_out
    out: [classfile]

Composing a workflow

cwlVersion: v1.0
class: Workflow
label: EMG QC workflow, (paired end version). Benchmarking with MG-RAST expt.

requirements:
 - class: SubworkflowFeatureRequirement
 - class: SchemaDefRequirement
   types: 
    - $import: ../tools/FragGeneScan-model.yaml
    - $import: ../tools/trimmomatic-sliding_window.yaml
    - $import: ../tools/trimmomatic-end_mode.yaml
    - $import: ../tools/trimmomatic-phred.yaml

inputs:
  reads:
    type: File
    format: edam:format_1930  # FASTQ

outputs:
  processed_sequences:
    type: File
    outputSource: clean_fasta_headers/sequences_with_cleaned_headers

steps:
  trim_quality_control:
    doc: |
      Low quality trimming (low quality ends and sequences with < quality scores
      less than 15 over a 4 nucleotide wide window are removed)
    run: ../tools/trimmomatic.cwl
    in:
      reads1: reads
      phred: { default: '33' }
      leading: { default: 3 }
      trailing: { default: 3 }
      end_mode: { default: SE }
      minlen: { default: 100 }
      slidingwindow:
        default:
          windowSize: 4
          requiredQuality: 15
    out: [reads1_trimmed]

  convert_trimmed-reads_to_fasta:
    run: ../tools/fastq_to_fasta.cwl
    in:
      fastq: trim_quality_control/reads1_trimmed
    out: [ fasta ]

  clean_fasta_headers:
    run: ../tools/clean_fasta_headers.cwl
    in:
      sequences: convert_trimmed-reads_to_fasta/fasta
    out: [ sequences_with_cleaned_headers ]


$namespaces:
 edam: http://edamontology.org/
 s: http://schema.org/
$schemas:
 - http://edamontology.org/EDAM_1.16.owl
 - https://schema.org/docs/schema_org_rdfa.html

s:license: "https://www.apache.org/licenses/LICENSE-2.0"
s:copyrightHolder: "EMBL - European Bioinformatics Institute"
stain@biggie:~/src/ebi-metagenomics-cwl$ find . -name *cwl | xargs ls
./tools/5S-from-tablehits.cwl		  ./tools/minia.cwl
./tools/biom-convert.cwl		  ./tools/modify_taxonomy_table.cwl
./tools/biom-summarize_table.cwl	  ./tools/nhmmer.cwl
./tools/clean_fasta_headers.cwl		  ./tools/oneLineFasta.cwl
./tools/cmsearch-deoverlap.cwl		  ./tools/orf_stats.cwl
./tools/collate_unique_SSU_headers.cwl	  ./tools/prepend_header.cwl
./tools/concatenate.cwl			  ./tools/pull-5Ss.cwl
./tools/count_fasta.cwl			  ./tools/pull-LSUs.cwl
./tools/count_fastq.cwl			  ./tools/pull-SSUs.cwl
./tools/create_categorisations.cwl	  ./tools/qc-stats.cwl
./tools/discard_short_seqs.cwl		  ./tools/qiime-filter_tree.cwl
./tools/esl-reformat.cwl		  ./tools/qiime-pick_closed_reference_otus.cwl
./tools/esl-sfetch-index.cwl		  ./tools/RevReadSort.cwl
./tools/esl-sfetch-manyseqs.cwl		  ./tools/rRNA_selection.cwl
./tools/esl-sfetch-oneseq.cwl		  ./tools/seqprep.cwl
./tools/extract_coord_lines.cwl		  ./tools/seqprep-merge.cwl
./tools/extract-coords-from-cmsearch.cwl  ./tools/SSU-from-tablehits.cwl
./tools/extract_observations.cwl	  ./tools/summary.cwl
./tools/extract_sig_coords.cwl		  ./tools/trimmomatic.cwl
./tools/faselector.cwl			  ./tools/tRNA_selection.cwl
./tools/fasta_chunker.cwl		  ./tools/update_krona_chart_urls.cwl
./tools/fastq_to_fasta.cwl		  ./tools/write_ipr_summary.cwl
./tools/FragGeneScan1_20.cwl		  ./workflows/16S_taxonomic_analysis.cwl
./tools/go_summary.cwl			  ./workflows/cmsearch-multimodel.cwl
./tools/hmmsearch.cwl			  ./workflows/convert-to-v3-layout.cwl
./tools/infernal-cmscan.cwl		  ./workflows/emg-assembly.cwl
./tools/infernal-cmsearch.cwl		  ./workflows/emg-core-analysis-v4.cwl
./tools/InterProScan5.21-60.cwl		  ./workflows/emg-pipeline-v3.cwl
./tools/ipr_stats.cwl			  ./workflows/emg-pipeline-v3-paired.cwl
./tools/krona.cwl			  ./workflows/emg-pipeline-v4-assembly-metaSPAdes.cwl
./tools/krona_setup.cwl			  ./workflows/emg-pipeline-v4-paired.cwl
./tools/LSU-from-tablehits.cwl		  ./workflows/emg-pipeline-v4-single.cwl
./tools/map_fa_headers.cwl		  ./workflows/emg-qc-paired.cwl
./tools/mapseq2biom.cwl			  ./workflows/emg-qc-single.cwl
./tools/mapseq.cwl			  ./workflows/functional_analysis.cwl
./tools/mask_RNA.cwl			  ./workflows/orf_prediction.cwl
./tools/megahit.cwl			  ./workflows/rna-selector.cwl
./tools/metaspades.cwl			  ./workflows/trim_and_reformat_reads.cwl

Provenance for Workflows?

Prospective provenance

The ‘recipes’ used to execute a computational task, e.g. the workflow specification or workflow template.

 

Retrospective provenance ​

The execution record of the running a workflow instance, details of every executed process and comprehensive info on execution environment

 

Workflow evolution

Changes across the workflow specification, its tool definitions and the software and services it depends on.

{
  "@context" : [ "https://w3id.org/bundle/context" ],
  "id" : "/",
  "manifest" : [ "manifest.json" ],
  "createdOn" : "2017-08-24T10:57:46.325Z",
  "createdBy" : {
    "uri" : "https://view.commonwl.org",
    "name" : "Common Workflow Language Viewer"
  },
  "authoredBy" : [ {
    "uri" : "mailto:peter.amstutz@curoverse.com",
    "name" : "Peter Amstutz"
  }, {
    "uri" : "mailto:luka.stojanovic@sbgenomics.com",
    "name" : "Luka Stojanovic"
  }, {
    "uri" : "mailto:crusoe@ucdavis.edu",
    "name" : "Michael R. Crusoe"
  }, {
    "uri" : "mailto:porter@porter.st",
    "name" : "Andrey Kartashov"
  }, {
    "uri" : "mailto:janko.simonovic@sbgenomics.com",
    "name" : "Janko Simonovic"
  } ],
  "retrievedFrom" : "https://github.com/common-workflow-language/workflows/blob/lobstr-v1/workflows/lobSTR/",
  "retrievedOn" : "2017-08-24T10:57:46.325Z",
  "retrievedBy" : {
    "uri" : "https://view.commonwl.org",
    "name" : "Common Workflow Language Viewer"
  },
  "history" : [ "http:/git2prov.org/git2prov?giturl=https:/github.com/common-workflow-language/workflows.git&serialization=PROV-JSON" ],
  "aggregates" : [ {
    "uri" : "/workflow/tmp_2.fq",
    "mediatype" : "application/octet-stream",
    "createdOn" : "2017-08-24T10:57:46.923Z",
    "authoredBy" : [ {
      "uri" : "mailto:peter.amstutz@curoverse.com",
      "name" : "Peter Amstutz"
    } ],
    "retrievedFrom" : "https://raw.githubusercontent.com/common-workflow-language/workflows/lobstr-v1/workflows/lobSTR/tmp_2.fq",
    "retrievedBy" : {
      "uri" : "https://view.commonwl.org",
      "name" : "Common Workflow Language Viewer"
    },
    "bundledAs" : {
      "uri" : "urn:uuid:61579f3e-63e6-49c2-b780-f67b2df461b7",
      "folder" : "/workflow/"
    }
  }, {
    "uri" : "/workflow/lobSTR-demo.json",
    "mediatype" : "application/json",
    "createdOn" : "2017-08-24T10:57:47.216Z",
    "authoredBy" : [ {
      "uri" : "mailto:peter.amstutz@curoverse.com",
      "name" : "Peter Amstutz"
    } ],
    "retrievedFrom" : "https://raw.githubusercontent.com/common-workflow-language/workflows/lobstr-v1/workflows/lobSTR/lobSTR-demo.json",
    "retrievedBy" : {
      "uri" : "https://view.commonwl.org",
      "name" : "Common Workflow Language Viewer"
    },
    "bundledAs" : {
      "uri" : "urn:uuid:973caa0e-f3bd-45e8-8d29-70123bc8715a",
      "folder" : "/workflow/"
    }
  }, {
    "uri" : "/workflow/models/illumina_v3.pcrfree.stuttermodel",
    "mediatype" : "application/octet-stream",
    "createdOn" : "2017-08-24T10:57:47.239Z",
    "authoredBy" : [ {
      "uri" : "mailto:peter.amstutz@curoverse.com",
      "name" : "Peter Amstutz"
    } ],
    "retrievedFrom" : "https://raw.githubusercontent.com/common-workflow-language/workflows/lobstr-v1/workflows/lobSTR/models/illumina_v3.pcrfree.stuttermodel",
    "retrievedBy" : {
      "uri" : "https://view.commonwl.org",
      "name" : "Common Workflow Language Viewer"
    },
    "bundledAs" : {
      "uri" : "urn:uuid:62bbcbea-f34f-463f-990d-6148f8ed5e5c",
      "folder" : "/workflow/models/"
    }
  }, {
    "uri" : "/workflow/models/illumina_v3.pcrfree.stepmodel",
    "mediatype" : "application/octet-stream",
    "createdOn" : "2017-08-24T10:57:47.266Z",
    "authoredBy" : [ {
      "uri" : "mailto:peter.amstutz@curoverse.com",
      "name" : "Peter Amstutz"
    } ],
    "retrievedFrom" : "https://raw.githubusercontent.com/common-workflow-language/workflows/lobstr-v1/workflows/lobSTR/models/illumina_v3.pcrfree.stepmodel",
    "retrievedBy" : {
      "uri" : "https://view.commonwl.org",
      "name" : "Common Workflow Language Viewer"
    },
    "bundledAs" : {
      "uri" : "urn:uuid:03439ae7-cd94-42a3-b5fe-40bfff6882d8",
      "folder" : "/workflow/models/"
    }
  }, {
    "uri" : "/workflow/samtools-sort.cwl",
    "mediatype" : "text/x-yaml",
    "createdOn" : "2017-08-24T10:57:47.269Z",
    "authoredBy" : [ {
      "uri" : "mailto:luka.stojanovic@sbgenomics.com",
      "name" : "Luka Stojanovic"
    }, {
      "uri" : "mailto:crusoe@ucdavis.edu",
      "name" : "Michael R. Crusoe"
    }, {
      "uri" : "mailto:porter@porter.st",
      "name" : "Andrey Kartashov"
    }, {
      "uri" : "mailto:peter.amstutz@curoverse.com",
      "name" : "Peter Amstutz"
    } ],
    "retrievedFrom" : "https://raw.githubusercontent.com/common-workflow-language/workflows/lobstr-v1/workflows/lobSTR/samtools-sort.cwl",
    "retrievedBy" : {
      "uri" : "https://view.commonwl.org",
      "name" : "Common Workflow Language Viewer"
    },
    "conformsTo" : "https://w3id.org/cwl/v1.0",
    "bundledAs" : {
      "uri" : "urn:uuid:2dc07859-efc2-4945-a95f-ba7815b68d07",
      "folder" : "/workflow/"
    }
  }, {
    "uri" : "/workflow/lobSTR-workflow.cwl",
    "mediatype" : "text/x-yaml",
    "createdOn" : "2017-08-24T10:57:47.42Z",
    "authoredBy" : [ {
      "uri" : "mailto:luka.stojanovic@sbgenomics.com",
      "name" : "Luka Stojanovic"
    }, {
      "uri" : "mailto:crusoe@ucdavis.edu",
      "name" : "Michael R. Crusoe"
    }, {
      "uri" : "mailto:peter.amstutz@curoverse.com",
      "name" : "Peter Amstutz"
    } ],
    "retrievedFrom" : "https://raw.githubusercontent.com/common-workflow-language/workflows/lobstr-v1/workflows/lobSTR/lobSTR-workflow.cwl",
    "retrievedBy" : {
      "uri" : "https://view.commonwl.org",
      "name" : "Common Workflow Language Viewer"
    },
    "conformsTo" : "https://w3id.org/cwl/v1.0",
    "bundledAs" : {
      "uri" : "urn:uuid:58bc1895-3460-46d6-91d7-fa1718d09631",
      "folder" : "/workflow/"
    }
  }, {
    "uri" : "/workflow/lobSTR-arvados-demo.json",
    "mediatype" : "application/json",
    "createdOn" : "2017-08-24T10:57:47.453Z",
    "authoredBy" : [ {
      "uri" : "mailto:peter.amstutz@curoverse.com",
      "name" : "Peter Amstutz"
    } ],
    "retrievedFrom" : "https://raw.githubusercontent.com/common-workflow-language/workflows/lobstr-v1/workflows/lobSTR/lobSTR-arvados-demo.json",
    "retrievedBy" : {
      "uri" : "https://view.commonwl.org",
      "name" : "Common Workflow Language Viewer"
    },
    "bundledAs" : {
      "uri" : "urn:uuid:30c683bc-69fb-4d93-8dad-65b663783af5",
      "folder" : "/workflow/"
    }
  }, {
    "uri" : "/workflow/samtools-index.cwl",
    "mediatype" : "text/x-yaml",
    "createdOn" : "2017-08-24T10:57:47.458Z",
    "authoredBy" : [ {
      "uri" : "mailto:luka.stojanovic@sbgenomics.com",
      "name" : "Luka Stojanovic"
    }, {
      "uri" : "mailto:crusoe@ucdavis.edu",
      "name" : "Michael R. Crusoe"
    }, {
      "uri" : "mailto:porter@porter.st",
      "name" : "Andrey Kartashov"
    }, {
      "uri" : "mailto:peter.amstutz@curoverse.com",
      "name" : "Peter Amstutz"
    } ],
    "retrievedFrom" : "https://raw.githubusercontent.com/common-workflow-language/workflows/lobstr-v1/workflows/lobSTR/samtools-index.cwl",
    "retrievedBy" : {
      "uri" : "https://view.commonwl.org",
      "name" : "Common Workflow Language Viewer"
    },
    "conformsTo" : "https://w3id.org/cwl/v1.0",
    "bundledAs" : {
      "uri" : "urn:uuid:8235d3f8-6927-4f73-b160-8521838a1cbb",
      "folder" : "/workflow/"
    }
  }, {
    "uri" : "/workflow/lobSTR-tool.cwl",
    "mediatype" : "text/x-yaml",
    "createdOn" : "2017-08-24T10:57:47.476Z",
    "authoredBy" : [ {
      "uri" : "mailto:luka.stojanovic@sbgenomics.com",
      "name" : "Luka Stojanovic"
    }, {
      "uri" : "mailto:crusoe@ucdavis.edu",
      "name" : "Michael R. Crusoe"
    }, {
      "uri" : "mailto:peter.amstutz@curoverse.com",
      "name" : "Peter Amstutz"
    } ],
    "retrievedFrom" : "https://raw.githubusercontent.com/common-workflow-language/workflows/lobstr-v1/workflows/lobSTR/lobSTR-tool.cwl",
    "retrievedBy" : {
      "uri" : "https://view.commonwl.org",
      "name" : "Common Workflow Language Viewer"
    },
    "conformsTo" : "https://w3id.org/cwl/v1.0",
    "bundledAs" : {
      "uri" : "urn:uuid:7fa6fbe4-1fc5-4cb5-9c1a-56b96c5f7aaf",
      "folder" : "/workflow/"
    }
  }, {
    "uri" : "/workflow/allelotype.cwl",
    "mediatype" : "text/x-yaml",
    "createdOn" : "2017-08-24T10:57:47.537Z",
    "authoredBy" : [ {
      "uri" : "mailto:luka.stojanovic@sbgenomics.com",
      "name" : "Luka Stojanovic"
    }, {
      "uri" : "mailto:janko.simonovic@sbgenomics.com",
      "name" : "Janko Simonovic"
    }, {
      "uri" : "mailto:crusoe@ucdavis.edu",
      "name" : "Michael R. Crusoe"
    }, {
      "uri" : "mailto:peter.amstutz@curoverse.com",
      "name" : "Peter Amstutz"
    } ],
    "retrievedFrom" : "https://raw.githubusercontent.com/common-workflow-language/workflows/lobstr-v1/workflows/lobSTR/allelotype.cwl",
    "retrievedBy" : {
      "uri" : "https://view.commonwl.org",
      "name" : "Common Workflow Language Viewer"
    },
    "conformsTo" : "https://w3id.org/cwl/v1.0",
    "bundledAs" : {
      "uri" : "urn:uuid:3706bd2f-e53f-431d-b32a-deb661d9b292",
      "folder" : "/workflow/"
    }
  }, {
    "uri" : "/workflow/README",
    "mediatype" : "application/octet-stream",
    "createdOn" : "2017-08-24T10:57:47.555Z",
    "authoredBy" : [ {
      "uri" : "mailto:crusoe@ucdavis.edu",
      "name" : "Michael R. Crusoe"
    }, {
      "uri" : "mailto:peter.amstutz@curoverse.com",
      "name" : "Peter Amstutz"
    } ],
    "retrievedFrom" : "https://raw.githubusercontent.com/common-workflow-language/workflows/lobstr-v1/workflows/lobSTR/README",
    "retrievedBy" : {
      "uri" : "https://view.commonwl.org",
      "name" : "Common Workflow Language Viewer"
    },
    "bundledAs" : {
      "uri" : "urn:uuid:ed54c4d6-c585-4dc9-b7bc-0cf299e20b91",
      "folder" : "/workflow/"
    }
  }, {
    "uri" : "/workflow/tmp_1.fq",
    "mediatype" : "application/octet-stream",
    "createdOn" : "2017-08-24T10:57:47.738Z",
    "authoredBy" : [ {
      "uri" : "mailto:peter.amstutz@curoverse.com",
      "name" : "Peter Amstutz"
    } ],
    "retrievedFrom" : "https://raw.githubusercontent.com/common-workflow-language/workflows/lobstr-v1/workflows/lobSTR/tmp_1.fq",
    "retrievedBy" : {
      "uri" : "https://view.commonwl.org",
      "name" : "Common Workflow Language Viewer"
    },
    "bundledAs" : {
      "uri" : "urn:uuid:5d431f81-ad0b-4acf-903a-9d5aa03b04df",
      "folder" : "/workflow/"
    }
  }, {
    "uri" : "/visualisation.png",
    "mediatype" : "image/png",
    "createdOn" : "2017-08-24T10:57:47.801Z",
    "retrievedFrom" : "https://view.commonwl.org/graph/png/github.com/common-workflow-language/workflows/blob/lobstr-v1/workflows/lobSTR/lobSTR-workflow.cwl",
    "bundledAs" : {
      "uri" : "urn:uuid:ff9ace37-e76c-49f8-8d36-60f11ff6d257",
      "folder" : "/"
    }
  }, {
    "uri" : "/visualisation.svg",
    "mediatype" : "image/svg+xml",
    "createdOn" : "2017-08-24T10:57:47.821Z",
    "retrievedFrom" : "https://view.commonwl.org/graph/svg/github.com/common-workflow-language/workflows/blob/lobstr-v1/workflows/lobSTR/lobSTR-workflow.cwl",
    "bundledAs" : {
      "uri" : "urn:uuid:a6cfb437-8818-4ab2-9081-efc74c5109e8",
      "folder" : "/"
    }
  } ],
  "annotations" : [ {
    "uri" : "urn:uuid:9f602fff-b280-41c5-9590-ab95a49c85ad",
    "about" : "/",
    "content" : "annotations/merged.cwl"
  }, {
    "uri" : "urn:uuid:0ce4b727-ff61-4534-9afb-e3d676d2782d",
    "about" : "/",
    "content" : "annotations/workflow.ttl"
  } ]
}
#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: Workflow

label: "Hello World"
doc: "Outputs a message using echo"

inputs: []

outputs:
  response:
    outputSource: step0/response
    type: File

steps:
  step0:
    run:
      class: CommandLineTool
      inputs:
        message:
          type: string
          doc: "The message to print"
          default: "Hello World"
          inputBinding:
            position: 1
      baseCommand: echo
      stdout: response.txt
      outputs:
        response:
          type: stdout
    in: []
    out: [response]

Permalink URI scheme

https://w3id.org/cwl/view/{scheme}/{commit}/{path}{?part=fragment}
  • https://w3id.org/cwl/view/ fixed prefix at permalink service https://w3id.org/
  • {scheme} - source code management protocol, currently only git supported:
    • {commit} - full git commit sha1 id (no branches or short commits allowed)
    • {path} - relative path to .cwl file within a checkout of that git commit
    • {?part=fragment} - optional part within CWL file , e.g. #main

Any git permalinks are resolved using https://view.commonwl.org/git which - if it knows about that particular git commit - will content-negotiate to provide various representations.

Anyone can mint these permalinks for .cwl files for a given commit, in any public or private git repository, given no uncommitted files or git submodules.

Farah Zaib Khan et al,
CWLProv – Interoperable retrospective provenance capture and its challenges,
BOSC 2018

https://doi.org/10.7490/f1000research.1115721.1

Provenance using PROV-Model
expanded with wfprov and wfdesc

Workflow specifications

Levels of Provenance

./revsort-run-1/bagit.txt
./revsort-run-1/bag-info.txt

./revsort-run-1/manifest-sha1.txt
./revsort-run-1/tagmanifest-sha1.txt

./revsort-run-1/data
./revsort-run-1/data/97/97fe1b50b4582cebc7d853796ebd62e3e163aa3f
./revsort-run-1/data/32/327fc7aedf4f6b69a42a7c8b808dc5a7aff61376
./revsort-run-1/data/b9/b9214658cc453331b62c2282b772a5c063dbd284

./revsort-run-1/workflow
./revsort-run-1/workflow/packed.cwl
./revsort-run-1/workflow/primary-output.json
./revsort-run-1/workflow/primary-job.json

./revsort-run-1/snapshot
./revsort-run-1/snapshot/revtool.cwl
./revsort-run-1/snapshot/revsort.cwl
./revsort-run-1/snapshot/sorttool.cwl

./revsort-run-1/metadata/logs/engine.ac9c1653-4291-47bc-86f8-6dedcff13519.txt

./revsort-run-1/metadata/manifest.jsonld

./revsort-run-1/metadata/provenance/primary.cwlprov.provn
./revsort-run-1/metadata/provenance/primary.cwlprov.json
./revsort-run-1/metadata/provenance/primary.cwlprov.ttl
./revsort-run-1/metadata/provenance/primary.cwlprov.jsonld
./revsort-run-1/metadata/provenance/primary.cwlprov.xml
./revsort-run-1/metadata/provenance/primary.cwlprov.nt

Completeness

Workflow values

Rerunnable

Reusable

Unstructured log

Relations and identifiers

Structured execution log

Profile

Level 0

Level 1

"Transport
  level"

document
prefix wfprov <http://purl.org/wf4ever/wfprov#>
prefix prov <http://www.w3.org/ns/prov#>
prefix wfdesc <http://purl.org/wf4ever/wfdesc#>
prefix wf <https://w3id.org/cwl/view/git/933bf2a1a1cce32d88f88f136275535da9df0954/workflows/hello/hello.cwl#>
prefix input <app://579c1b74-b328-4da6-80a8-a2ffef2ac9b5/workflow/input.json#>
prefix run <urn:uuid:>
prefix engine <urn:uuid:>
prefix data <urn:hash:sha256:>

default <app://579c1b74-b328-4da6-80a8-a2ffef2ac9b5/>

// Level 1 provenance of workflow run

activity(run:2e1287e0-6dfb-11e7-8acf-0242ac110002, , , [prov:type='wfprov:WorkflowRun', prov:label="Run of workflow/packed.cwl#main"])    
    wasStartedBy(run:2e1287e0-6dfb-11e7-8acf-0242ac110002, -, -, -, 2017-10-27T14:24:00+01:00)  

    // The engine is the SoftwareAgent that is executing our Workflow plan
    wasAssociatedWith(run:2e1287e0-6dfb-11e7-8acf-0242ac110002, engine:b2210211-8acb-4d58-bd28-2a36b18d3b4f, wf:main)
        agent(engine:b2210211-8acb-4d58-bd28-2a36b18d3b4f, prov:type='prov:SoftwareAgent', prov:type='wfprov:WorkflowEngine', prov:label="cwltool v1.2.5")
        // prov has no term to relate sub-plans - we'll use wfdesc:hasSubProcess
        entity(wf:main,[prov:type='wfdesc:Workflow', prov:type='prov:Plan', wfdesc:hasSubProcess='wf:main/step1', wfdesc:hasSubProcess='wf:main/step2'])
            alternateOf(wf:main, workflow/packed.cwl)
            entity(wf:main/step1,[prov:type='wfdesc:Process', prov:type='prov:Plan'])
            entity(wf:main/step2,[prov:type='wfdesc:Process', prov:type='prov:Plan'])            

    // First the workflow uses some data; here with a urn:sha:sha256 identifier
    used(run:2e1287e0-6dfb-11e7-8acf-0242ac110002, data:5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03, 2017-10-27T14:29:00+01:00, [prov:role='wf:main/input1']))
        entity(data:5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03, [prov:type='wfprov:Artifact'])
            // which we have stored a copy of within the research object
            specializationOf(data/58/5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03, data:5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03)

    // Then there was another activity - wfprov:ProcessRun indicating a command line tool
    activity(run:4305467e-6dfb-11e7-885d-0242ac110002, -, -, [prov:type='wfprov:ProcessRun', prov:label="Run of workflow/packed.cwl#main/step1"])
        // started by the mother activity
        wasStartedBy(run:4305467e-6dfb-11e7-885d-0242ac110002, -, -, run:2e1287e0-6dfb-11e7-8acf-0242ac110002, 2017-10-27T15:00:00+01:00)
        // same engine using step1 as plan. In a distributed scenario there might be a different engine
        wasAssociatedWith(run:4305467e-6dfb-11e7-885d-0242ac110002, engine:b2210211-8acb-4d58-bd28-2a36b18d3b4f, wf:main/step1)
        // This activity also use the same data, but in a different role (e.g. input parameter)
        used(run:4305467e-6dfb-11e7-885d-0242ac110002, data:5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03, 2017-10-27T14:00:00+01:00, [prov:role='wf:main/step1/in1'])

        // And we generate some new data
        wasGeneratedBy(data:00688350913f2f292943a274b57019d58889eda272370af261c84e78e204743c, run:4305467e-6dfb-11e7-885d-0242ac110002, 2017-10-27T16:00:00+01:00, [prov:role='wf:main/step1/out1']))
            entity(data:00688350913f2f292943a274b57019d58889eda272370af261c84e78e204743c, [prov:type='wfprov:Artifact'])
                // again stored in the RO
                specializationOf(data/00/00688350913f2f292943a274b57019d58889eda272370af261c84e78e204743c, data:00688350913f2f292943a274b57019d58889eda272370af261c84e78e204743c)

        // step1 finished
        wasEndedBy(run:4305467e-6dfb-11e7-885d-0242ac110002, -, -, run:2e1287e0-6dfb-11e7-8acf-0242ac110002, 2017-10-27T15:30:00+01:00)

    // the master workflow then "generate" that same value, but now at a different time and role (the resultA master workflow output)
    wasGeneratedBy(data:00688350913f2f292943a274b57019d58889eda272370af261c84e78e204743c, run:2e1287e0-6dfb-11e7-8acf-0242ac110002, 2017-10-27T15:00:00+01:00, [prov:role='wf:main/resultA'])

    // next step activity
    activity(run:c42dc36e-6dfd-11e7-bc24-0242ac110002, -, - [prov:type='wfprov:ProcessRun', prov:label="Run of workflow/packed.cwl#main/step2"])
        wasStartedBy(run:c42dc36e-6dfd-11e7-bc24-0242ac110002, -, -, run:2e1287e0-6dfb-11e7-8acf-0242ac110002, 2017-10-27T16:00:00+01:00)
        // associated with step2
        wasAssociatedWith(run:c42dc36e-6dfd-11e7-bc24-0242ac110002, engine:b2210211-8acb-4d58-bd28-2a36b18d3b4f, wf:main/step2)
        
        // Uses two data artifacts; one which came from previous step, other as workflow input
        used(run:4305467e-6dfb-11e7-885d-0242ac110002, data:5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03, 2017-10-27T15:00:00+01:00, [prov:role='wf:main/step2/valueA'])
        used(run:4305467e-6dfb-11e7-885d-0242ac110002, data:00688350913f2f292943a274b57019d58889eda272370af261c84e78e204743c, 2017-10-27T15:00:00+01:00, [prov:role='wf:main/step2/valueB'])
        
        // and generate two new data artifacts
        wasGeneratedBy(data:952f537d1f3116db56703787ace248fe00ae46fa77ea3803aa3d8dc01d221a9d, run:c42dc36e-6dfd-11e7-bc24-0242ac110002,  2017-10-27T16:34:20+01:00, [prov:role='wf:main/step2/out1'])))
            entity(data:952f537d1f3116db56703787ace248fe00ae46fa77ea3803aa3d8dc01d221a9d, [prov:type='wfprov:Artifact'])
                specializationOf(data/95/2f537d1f3116db56703787ace248fe00ae46fa77ea3803aa3d8dc01d221a9d, data:952f537d1f3116db56703787ace248fe00ae46fa77ea3803aa3d8dc01d221a9d)

        wasGeneratedBy(data:3deb00bd0decd1f21d015a178c4f23a5eb537588c08eeee9d55059ec29637be0, run:c42dc36e-6dfd-11e7-bc24-0242ac110002,  2017-10-27T16:34:20+01:00, [prov:role='wf:main/step2/out2'])))
            entity(data:3deb00bd0decd1f21d015a178c4f23a5eb537588c08eeee9d55059ec29637be0, [prov:type='wfprov:Artifact'])
                specializationOf(data/3d/eb00bd0decd1f21d015a178c4f23a5eb537588c08eeee9d55059ec29637be0, data:3deb00bd0decd1f21d015a178c4f23a5eb537588c08eeee9d55059ec29637be0)
        // step2 ends
        wasEndedBy(run:c42dc36e-6dfd-11e7-bc24-0242ac110002, -, -, run:2e1287e0-6dfb-11e7-8acf-0242ac110002, 2017-10-27T16:30:00+01:00)

    // only step output out1 captured by mother workflow, sent to resultB workflow output
    wasGeneratedBy(data:952f537d1f3116db56703787ace248fe00ae46fa77ea3803aa3d8dc01d221a9d, run:2e1287e0-6dfb-11e7-8acf-0242ac110002, 2017-10-27T15:00:00+01:00, [prov:role='wf:main/resultB'])

    // mother workflow ends
    wasEndedBy(run:2e1287e0-6dfb-11e7-8acf-0242ac110002, -, -, run:2e1287e0-6dfb-11e7-8acf-0242ac110002, 2017-10-27T16:34:40+01:00)

endDocument

CWLProv

Which PROV format?

<prov:wasGeneratedBy>
  <prov:entity prov:ref="ex:ent1"/>
  <prov:activity prov:ref="ex:act1"/>
  <prov:time>2017-10-26T21:32:52Z</prov:time>
  <ex:port>p1</ex:port>
</prov:wasGeneratedBy>
wasGeneratedBy(ent1, act1, 
  2017-10-26T21:32:52Z, ex:port="p1")
:ent1
  a prov:Entity;
  prov:wasGeneratedBy :act1;
  prov:generatedAtTime "2017-10-26T21:32:52Z"^^xsd:dateTime ;
  ex:port "p1" .
    "wasGeneratedBy": {
        "ex:gen1": {
            "prov:entity": "ent1",
            "prov:activity": "act1",
            "prov:time": "2017-10-26T21:32:52Z",
            "ex:port": "p1"
        },
    },
{ "@context": { .. }, 
  "@id": "ent1",
  "@type": "prov:Entity",
  "ex:port": "p1",
  "prov:generatedAtTime":  "2017-10-26T21:32:52Z",
  "prov:wasGeneratedBy": {
    "@id": "act1",
    "@type": "prov:Activity"
  } 
}

PROV-N

PROV-XML

PROV-JSON

PROV-O Turtle

PROV-O JSON-LD

Nested workflows

 

 

prefix id <urn:uuid:>
prefix provenance <arcp://uuid,73eab018-7b36-4f84-a845-aca8073bd46c/metadata/provenance/>

agent(id:a606d227-bf10-4479-8d11-823bb932bbac, 
    [prov:type='wfprov:WorkflowEngine', prov:type='prov:SoftwareAgent', 
     prov:label="cwltool 1.0.20180817162414"])

activity(id:73eab018-7b36-4f84-a845-aca8073bd46c, 2018-08-21T15:20:35.059920, -, 
    [prov:type='wfprov:WorkflowRun', prov:label="Run of workflow/packed.cwl#main"])
wasStartedBy(id:73eab018-7b36-4f84-a845-aca8073bd46c, -, id:a606d227-bf10-4479-8d11-823bb932bbac, 2018-08-21T15:20:35.060038)

activity(id:e79fc8dc-6e40-4236-b22c-41fee22947a9, -, -, 
     [prov:type='wfprov:ProcessRun', prov:label="Run of workflow/packed.cwl#main/compile"])
wasStartedBy(id:e79fc8dc-6e40-4236-b22c-41fee22947a9, -, id:73eab018-7b36-4f84-a845-aca8073bd46c, 2018-08-21T15:20:35.163189)

activity(id:e79fc8dc-6e40-4236-b22c-41fee22947a9, -, -, 
     [prov:has_provenance='provenance:workflow_compile.e79fc8dc-6e40-4236-b22c-41fee22947a9.cwlprov.provn',
      prov:has_provenance='provenance:workflow_compile.e79fc8dc-6e40-4236-b22c-41fee22947a9.cwlprov.ttl'
])
{
    "about": "urn:uuid:e79fc8dc-6e40-4236-b22c-41fee22947a9",
    "content": [
        "provenance/workflow_20compile.e79fc8dc-6e40-4236-b22c-41fee22947a9.cwlprov.provn",
        "provenance/workflow_20compile.e79fc8dc-6e40-4236-b22c-41fee22947a9.cwlprov.ttl",
    ],
    "oa:motivatedBy": {
        "@id": "http://www.w3.org/ns/prov#has_provenance"
    }
}

metadata/provenance/primary.cwlprov.provn

metadata/manifest.json

activity(id:e79fc8dc-6e40-4236-b22c-41fee22947a9, -, -, 
     [prov:type='wfprov:ProcessRun', prov:label="Run of workflow/packed.cwl#main/compile"])
wasStartedBy(id:e79fc8dc-6e40-4236-b22c-41fee22947a9, -, id:73eab018-7b36-4f84-a845-aca8073bd46c, 2018-08-21T15:20:35.163189)

activity(id:e79fc8dc-6e40-4236-b22c-41fee22947a9, -, -, 
     [prov:has_provenance='provenance:workflow_compile.e79fc8dc-6e40-4236-b22c-41fee22947a9.cwlprov.provn',
      prov:has_provenance='provenance:workflow_compile.e79fc8dc-6e40-4236-b22c-41fee22947a9.cwlprov.ttl'
])
prefix id <urn:uuid:>
agent(id:a606d227-bf10-4479-8d11-823bb932bbac, 
    [prov:type='wfprov:WorkflowEngine', prov:type='prov:SoftwareAgent', 
     prov:label="cwltool 1.0.20180817162414"])

activity(id:e79fc8dc-6e40-4236-b22c-41fee22947a9, 2018-08-21T15:20:35.089187, -, 
    [prov:type='wfprov:WorkflowRun', prov:label="Run of workflow/packed.cwl#compile.cwl"])
wasStartedBy(id:e79fc8dc-6e40-4236-b22c-41fee22947a9, -, id:a606d227-bf10-4479-8d11-823bb932bbac, 2018-08-21T15:20:35.089303)

activity(id:9b1a2b69-3403-4063-9ff0-e7ac2df32036, -, -, 
     [prov:type='wfprov:ProcessRun', prov:label="Run of workflow/packed.cwl#compile.cwl/step1"])
wasStartedBy(id:9b1a2b69-3403-4063-9ff0-e7ac2df32036, -, id:e79fc8dc-6e40-4236-b22c-41fee22947a9, 2018-08-21T15:20:35.163189)

activity(id:9b1a2b69-3403-4063-9ff0-e7ac2df32036, -, -, 
     [prov:type='wfprov:ProcessRun', prov:label="Run of workflow/packed.cwl#compile.cwl/step2"])
wasStartedBy(id:9b1a2b69-3403-4063-9ff0-e7ac2df32036, -, id:e79fc8dc-6e40-4236-b22c-41fee22947a9, 2018-08-21T15:20:35.163189)

metadata/provenance/

  primary.cwlprov.provn

metadata/provenance/

  workflow_compile.e79fc8dc-6e40-4236-b22c-41fee22947a9.cwlprov.provn

Level 1

Level 2

Identifying intermediate data

Output 1B file is also Input 2C and Input 3D downstream

Simple filenames -> duplications

  ./data/step1/outputB.txt 
./data/step2/inputC.txt
./data/step3/inputD.txt

 

Content-adressable

SHA-256 hash of bytes as filename:

./data/51/51fb8af0c4ae0422fbe88340d91880ecb9d7537cf57339c1cf1256b7ca58f32d

RFC6920 URI as global identifier:

nih:sha-256;51fb8af0c4ae0422fbe88340d91880ecb9d7537cf57339c1cf1256b7ca58f32d

{
    "@context": [
        {
            "@base": "arcp://uuid,1f767ad4-ac52-4623-b5bc-dd9faf2b869f/metadata/"
        },
        "https://w3id.org/bundle/context"
    ],

    "id": "/",
    "conformsTo": "https://w3id.org/cwl/prov/0.6.0",
    "manifest": "manifest.json",
    "createdOn": "2018-10-25T15:46:43.191346",
    "createdBy": {
        "uri": "urn:uuid:ac9c1653-4291-47bc-86f8-6dedcff13519",
        "name": "cwltool 1.0.20181012180214"
    },
    "authoredBy": {
        "orcid": "https://orcid.org/0000-0001-9842-9718",
        "name": "Stian Soiland-Reyes"
    },

    "aggregates": [
        {
            "uri": "urn:hash::sha1:327fc7aedf4f6b69a42a7c8b808dc5a7aff61376",
            "bundledAs": {
                "uri": "arcp://uuid,1f767ad4-ac52-4623-b5bc-dd9faf2b869f/data/32/327fc7aedf4f6b69a42a7c8b808dc5a7aff61376",
                "folder": "/data/32/",
                "filename": "327fc7aedf4f6b69a42a7c8b808dc5a7aff61376"
            }
        },
        {
            "uri": "urn:hash::sha1:97fe1b50b4582cebc7d853796ebd62e3e163aa3f",
            "bundledAs": {
                "uri": "arcp://uuid,1f767ad4-ac52-4623-b5bc-dd9faf2b869f/data/97/97fe1b50b4582cebc7d853796ebd62e3e163aa3f",
                "folder": "/data/97/",
                "filename": "97fe1b50b4582cebc7d853796ebd62e3e163aa3f"
            }
        },
        {
            "uri": "urn:hash::sha1:b9214658cc453331b62c2282b772a5c063dbd284",
            "bundledAs": {
                "uri": "arcp://uuid,1f767ad4-ac52-4623-b5bc-dd9faf2b869f/data/b9/b9214658cc453331b62c2282b772a5c063dbd284",
                "folder": "/data/b9/",
                "filename": "b9214658cc453331b62c2282b772a5c063dbd284"
            }
        },
        {
            "uri": "provenance/primary.cwlprov.xml",
            "mediatype": "application/xml",
            "conformsTo": [
                "http://www.w3.org/TR/2013/NOTE-prov-xml-20130430/",
                "https://w3id.org/cwl/prov/0.6.0"
            ],
            "createdOn": "2018-10-25T15:46:43.191481",
            "createdBy": {
                "uri": "urn:uuid:ac9c1653-4291-47bc-86f8-6dedcff13519",
                "name": "cwltool 1.0.20181012180214"
            }
        },
        {
            "uri": "../snapshot/revtool.cwl",
            "mediatype": "text/x+yaml; charset=\"UTF-8\"",
            "conformsTo": "https://w3id.org/cwl/",
            "createdOn": "2018-06-05T15:19:48.781496"
        },
        {
            "uri": "../snapshot/empty.ttl",
            "mediatype": "text/turtle; charset=\"UTF-8\"",
            "createdOn": "2018-04-04T13:29:55.717707"
        },
        {
            "uri": "provenance/primary.cwlprov.json",
            "mediatype": "application/json",
            "conformsTo": [
                "http://www.w3.org/Submission/2013/SUBM-prov-json-20130424/",
                "https://w3id.org/cwl/prov/0.6.0"
            ],
            "createdOn": "2018-10-25T15:46:43.191686",
            "createdBy": {
                "uri": "urn:uuid:ac9c1653-4291-47bc-86f8-6dedcff13519",
                "name": "cwltool 1.0.20181012180214"
            }
        },
        {
            "uri": "../workflow/packed.cwl",
            "mediatype": "text/x+yaml; charset=\"UTF-8\"",
            "conformsTo": "https://w3id.org/cwl/",
            "createdOn": "2018-10-25T15:46:43.191761",
            "createdBy": {
                "uri": "urn:uuid:ac9c1653-4291-47bc-86f8-6dedcff13519",
                "name": "cwltool 1.0.20181012180214"
            }
        },
        {
            "uri": "provenance/primary.cwlprov.provn",
            "mediatype": "text/provenance-notation; charset=\"UTF-8\"",
            "conformsTo": [
                "http://www.w3.org/TR/2013/REC-prov-n-20130430/",
                "https://w3id.org/cwl/prov/0.6.0"
            ],
            "createdOn": "2018-10-25T15:46:43.191825",
            "createdBy": {
                "uri": "urn:uuid:ac9c1653-4291-47bc-86f8-6dedcff13519",
                "name": "cwltool 1.0.20181012180214"
            }
        },
        {
            "uri": "../snapshot/revsort.cwl",
            "mediatype": "text/x+yaml; charset=\"UTF-8\"",
            "conformsTo": "https://w3id.org/cwl/",
            "createdOn": "2018-10-25T15:40:08.769943"
        },
        {
            "uri": "../workflow/primary-output.json",
            "mediatype": "application/json",
            "createdOn": "2018-10-25T15:46:43.191944",
            "createdBy": {
                "uri": "urn:uuid:ac9c1653-4291-47bc-86f8-6dedcff13519",
                "name": "cwltool 1.0.20181012180214"
            }
        },
        {
            "uri": "provenance/primary.cwlprov.ttl",
            "mediatype": "text/turtle; charset=\"UTF-8\"",
            "conformsTo": [
                "http://www.w3.org/TR/2013/REC-prov-o-20130430/",
                "https://w3id.org/cwl/prov/0.6.0"
            ],
            "createdOn": "2018-10-25T15:46:43.192006",
            "createdBy": {
                "uri": "urn:uuid:ac9c1653-4291-47bc-86f8-6dedcff13519",
                "name": "cwltool 1.0.20181012180214"
            }
        },
        {
            "uri": "provenance/primary.cwlprov.jsonld",
            "mediatype": "application/ld+json",
            "conformsTo": [
                "http://www.w3.org/TR/2013/REC-prov-o-20130430/",
                "https://w3id.org/cwl/prov/0.6.0"
            ],
            "createdOn": "2018-10-25T15:46:43.192069",
            "createdBy": {
                "uri": "urn:uuid:ac9c1653-4291-47bc-86f8-6dedcff13519",
                "name": "cwltool 1.0.20181012180214"
            }
        },
        {
            "uri": "../snapshot/sorttool.cwl",
            "mediatype": "text/x+yaml; charset=\"UTF-8\"",
            "conformsTo": "https://w3id.org/cwl/",
            "createdOn": "2018-06-05T15:19:48.785496"
        },
        {
            "uri": "provenance/primary.cwlprov.nt",
            "mediatype": "application/n-triples",
            "conformsTo": [
                "http://www.w3.org/TR/2013/REC-prov-o-20130430/",
                "https://w3id.org/cwl/prov/0.6.0"
            ],
            "createdOn": "2018-10-25T15:46:43.192188",
            "createdBy": {
                "uri": "urn:uuid:ac9c1653-4291-47bc-86f8-6dedcff13519",
                "name": "cwltool 1.0.20181012180214"
            }
        },
        {
            "uri": "logs/engine.ac9c1653-4291-47bc-86f8-6dedcff13519.txt",
            "mediatype": "text/plain; charset=\"UTF-8\"",
            "createdOn": "2018-10-25T15:46:43.192249",
            "createdBy": {
                "uri": "urn:uuid:ac9c1653-4291-47bc-86f8-6dedcff13519",
                "name": "cwltool 1.0.20181012180214"
            }
        },
        {
            "uri": "../workflow/primary-job.json",
            "mediatype": "application/json",
            "createdOn": "2018-10-25T15:46:43.192312",
            "createdBy": {
                "uri": "urn:uuid:ac9c1653-4291-47bc-86f8-6dedcff13519",
                "name": "cwltool 1.0.20181012180214"
            }
        },
        {
            "createdOn": "2018-10-25T15:46:35.303633",
            "createdBy": {
                "uri": "urn:uuid:ac9c1653-4291-47bc-86f8-6dedcff13519",
                "name": "cwltool 1.0.20181012180214"
            },
            "uri": "urn:uuid:ed8d007b-a1f3-4bfe-b390-08df074d712d"
        },
        {
            "createdOn": "2018-10-25T15:46:37.067848",
            "createdBy": {
                "uri": "urn:uuid:ac9c1653-4291-47bc-86f8-6dedcff13519",
                "name": "cwltool 1.0.20181012180214"
            },
            "uri": "urn:uuid:4ab5a3fe-e481-4f7f-98c4-af8e5dfccb93"
        }
    ],

    "annotations": [
        {
            "uri": "urn:uuid:42a4ade6-245b-4746-acf9-e9910780a449",
            "about": "urn:uuid:1f767ad4-ac52-4623-b5bc-dd9faf2b869f",
            "content": "/",
            "oa:motivatedBy": {
                "@id": "oa:describing"
            }
        },

        {
            "uri": "urn:uuid:9ed6545d-dfb2-4cab-b4af-102735a2661e",
            "about": "urn:uuid:1f767ad4-ac52-4623-b5bc-dd9faf2b869f",
            "content": [
                "provenance/primary.cwlprov.xml",
                "provenance/primary.cwlprov.json",
                "provenance/primary.cwlprov.provn",
                "provenance/primary.cwlprov.ttl",
                "provenance/primary.cwlprov.jsonld",
                "provenance/primary.cwlprov.nt"
            ],
            "oa:motivatedBy": {
                "@id": "http://www.w3.org/ns/prov#has_provenance"
            }
        },

        {
            "uri": "urn:uuid:1c23181c-905c-49aa-a5e3-7194f9a43c29",
            "about": "../workflow/packed.cwl",
            "oa:motivatedBy": {
                "@id": "oa:highlighting"
            }
        },

        {
            "uri": "urn:uuid:4f4132a7-c27d-47d5-a96f-3ad6ca741fe8",
            "about": "urn:uuid:1f767ad4-ac52-4623-b5bc-dd9faf2b869f",
            "content": [
                "../workflow/packed.cwl",
                "../workflow/primary-job.json"
            ],
            "oa:motivatedBy": {
                "@id": "oa:linking"
            }
        },

        {
            "uri": "urn:uuid:6cf23e43-cbd1-4b77-af65-d9a32dc913dc",
            "about": "urn:uuid:ac9c1653-4291-47bc-86f8-6dedcff13519",
            "content": [
                "metadata/logs/engine.ac9c1653-4291-47bc-86f8-6dedcff13519.txt"
            ],
            "oa:motivatedBy": {
                "@id": "https://w3id.org/cwl/prov#log"
            }
        }
    ]
}

metadata/manifest.json

Consuming CWLProv ROs

$ cwlprov --help
usage: cwlprov [-h] [--version] [--directory DIRECTORY] [--relative]
            [--absolute] [--output OUTPUT] [--verbose] [--quiet] [--hints]
            [--no-hints]
            {validate,info,who,prov,inputs,outputs,run,runs,rerun,derived,runtimes}
            ...

cwlprov explores Research Objects containing provenance of Common Workflow
Language executions. <https://w3id.org/cwl/prov/>

commands:
{validate,info,who,prov,inputs,outputs,run,runs,rerun,derived,runtimes}
    validate            Validate the CWLProv Research Object
    info                show research object Metadata
    who                 show Who ran the workflow
    prov                export workflow execution Provenance in PROV format
    inputs              list workflow/step Input files/values
    outputs             list workflow/step Output files/values
    run                 show workflow Execution log
    runs                List all workflow executions in RO
    rerun               Rerun a workflow or step
    derived             list what was Derived from a data item, based on
                        activity usage/generation
    runtimes            calculate average step execution Runtimes
stain@biggie:~/src/cwlprov-py/test/revsort-cwlprov-0.4.0$ cwlprov --verbose validate
INFO:cwlprov.tool:Detected /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/bagit.txt
INFO:cwlprov.tool:Opening BagIt /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/data/32/327fc7aedf4f6b69a42a7c8b808dc5a7aff61376
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/data/97/97fe1b50b4582cebc7d853796ebd62e3e163aa3f
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/data/b9/b9214658cc453331b62c2282b772a5c063dbd284
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/workflow/packed.cwl
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/workflow/primary-job.json
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/metadata/provenance/primary.cwlprov.xml
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/metadata/provenance/primary.cwlprov.provn
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/metadata/provenance/primary.cwlprov.json
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/metadata/provenance/primary.cwlprov.ttl
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/metadata/provenance/primary.cwlprov.nt
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/metadata/provenance/primary.cwlprov.jsonld
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/snapshot/revsort.cwl
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/snapshot/hello.txt
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/snapshot/revtool.cwl
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/snapshot/empty.ttl
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/snapshot/sorttool.cwl
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/metadata/manifest.json
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/bag-info.txt
INFO:cwlprov.ro:Parsing RO manifest /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/metadata/manifest.json
Valid CWLProv RO: .

Validating


(venv3) stain@biggie:~/src/cwlprov-py/test/nested-cwlprov-0.3.0$ cwlprov run
2018-08-08 22:44:06.573330 Flow 39408a40-c1c8-4852-9747-87249425be1e [ Run of workflow/packed.cwl#main 
2018-08-08 22:44:06.691722 Step 4f082fb6-3e4d-4a21-82e3-c685ce3deb58   Run of workflow/packed.cwl#main/create-tar  (0:00:00.010133)
2018-08-08 22:44:06.702976 Step 0cceeaf6-4109-4f08-940b-f06ac959944a * Run of workflow/packed.cwl#main/compile  (unknown duration)
2018-08-08 22:44:12.680097 Flow 39408a40-c1c8-4852-9747-87249425be1e ] Run of workflow/packed.cwl#main  (0:00:06.106767)
Legend:
[ Workflow start
* Nested provenance, use UUID to explore: cwlprov run 0cceeaf6-4109-4f08-940b-f06ac959944a
] Workflow end

(venv3) stain@biggie:~/src/cwlprov-py/test/nested-cwlprov-0.3.0$ cwlprov run 0cceeaf6-4109-4f08-940b-f06ac959944a
2018-08-08 22:44:06.607210 Flow 0cceeaf6-4109-4f08-940b-f06ac959944a [ Run of workflow/packed.cwl#main 
2018-08-08 22:44:06.707070 Step 83752ab4-8227-4d4a-8baa-78376df34aed   Run of workflow/packed.cwl#main/untar  (0:00:00.008149)
2018-08-08 22:44:06.718554 Step f56d8478-a190-4251-84d9-7f69fe0f6f8b   Run of workflow/packed.cwl#main/argument  (0:00:00.532052)
2018-08-08 22:44:07.251588 Flow 0cceeaf6-4109-4f08-940b-f06ac959944a ] Run of workflow/packed.cwl#main  (0:00:00.644378)
Legend:
[ Workflow start
] Workflow end
stain@biggie:~/src/cwlprov-py/test/nested-cwlprov-0.3.0$ cwlprov outputs 4f082fb6-3e4d-4a21-82e3-c685ce3deb58 --format=files
Output tar:
data/c0/c0fd5812fe6d8d91fef7f4f1ba3a462500fce0c5

stain@biggie:~/src/cwlprov-py/test/nested-cwlprov-0.3.0$ tar tfv `cwlprov -q outputs 4f082fb6-3e4d-4a21-82e3-c685ce3deb58 --format=files`
-rw-r--r-- stain/stain     115 2018-08-08 23:44 Hello.java

Inspecting step runs

TODO: Best practice for publishing CWLProv ROs

 

>1 GB -> "Big Data"

stain@biggie:~/dropbox/work/cwlprov/newest-swift$ du -hs *
13G	alignment_0.6.0_linux
2.9G	rnaseqwf_0.5.0_mac
80M	somaticwf_0.5.0_ma

What next for CWLProv?

Adding CWLProv support to Toil

<?xml version="1.0" encoding="UTF-8"?>
<ur:UsageRecord xmlns="http://schema.ogf.org/urf/2013/04/urf"
	xmlns:ur="http://schema.ogf.org/urf/2013/04/urf" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://schema.ogf.org/urf/2013/04/urf">
	<ur:RecordIdentityBlock>
		<ur:RecordId>urn:uuid:4350d583-61a5-45e8-a229-957aa81e8014</ur:RecordId>
		<ur:CreateTime>2018-05-09T09:06:52Z</ur:CreateTime>
		<ur:Site>EMBL-EBI</ur:Site>
		<ur:Infrastructure>Embassy</ur:Infrastructure>
	</ur:RecordIdentityBlock>
	<ur:SubjectIdentityBlock>
		<ur:LocalUserId>stain</ur:LocalUserId>
		<ur:LocalGroupId>ELIXIRCWLImplStudy</ur:LocalGroupId>
		<ur:GlobalUserId>https://orcid.org/0000-0001-9842-9718</ur:GlobalUserId>
	</ur:SubjectIdentityBlock>
	<ur:ComputeUsageBlock>
		<ur:CpuDuration>PT3600S</ur:CpuDuration>
		<ur:WallDuration>PT3600S</ur:WallDuration>
		<ur:StartTime>2018-05-31T11:00:00</ur:StartTime>
		<ur:EndTime>2018-05-31T12:00:00</ur:EndTime>
		<ur:ExecutionHost>
			<ur:Hostname>compute-0-1.example.com</ur:Hostname>
			<ur:ProcessId>1042</ur:ProcessId>
			<ur:Benchmark ur:type="si2k">3.14</ur:Benchmark>
		</ur:ExecutionHost>
		<ur:Processors>4</ur:Processors>
		<ur:NodeCount>1</ur:NodeCount>
	</ur:ComputeUsageBlock>
	<ur:JobUsageBlock>
		<ur:GlobalJobId>host.example.org/ab1234</ur:GlobalJobId>
		<ur:LocalJobId>ab1234</ur:LocalJobId>
		<ur:JobName>MetaGenomics1337</ur:JobName>
		<ur:Queue ur:description="execution">"Bigmem"</ur:Queue>
		<ur:TimeInstant ur:type="Ctime">2018-05-31T10:30:00</ur:TimeInstant>
		<ur:TimeInstant ur:type="Qtime">2018-05-31T10:31:00</ur:TimeInstant>
		<ur:TimeInstant ur:type="Etime">2018-05-31T10:59:42</ur:TimeInstant>
	</ur:JobUsageBlock>
	<ur:MemoryUsageBlock>
		<ur:MemoryClass>"RAM"</ur:MemoryClass>
		<ur:MemoryResourceCapacityUsed>14728</ur:MemoryResourceCapacityUsed>
		<ur:MemoryResourceCapacityAllocated>56437</ur:MemoryResourceCapacityAllocated>
		<ur:MemoryResourceCapacityRequested>42000</ur:MemoryResourceCapacityRequested>
		<ur:StartTime>2018-05-31T11:00:00</ur:StartTime>
		<ur:EndTime>2018-05-31T12:00:00</ur:EndTime>
	</ur:MemoryUsageBlock>
	<ur:StorageUsageBlock>
		<ur:StorageShare>pool-003</ur:StorageShare>
		<ur:StorageMedia>disk</ur:StorageMedia>
		<ur:StorageClass>replicated</ur:StorageClass>
		<ur:DirectoryPath>/projectA</ur:DirectoryPath>
		<ur:FileCount>42</ur:FileCount>
		<ur:StorageResourceCapacityUsed>14728</ur:StorageResourceCapacityUsed>
		<ur:StorageLogicalCapacityUsed>13617</ur:StorageLogicalCapacityUsed>
		<ur:StorageResourceCapacityAllocated>14624
		</ur:StorageResourceCapacityAllocated>
		<ur:StartTime>2018-05-07T09:31:40Z</ur:StartTime>
		<ur:EndTime>2018-05-08T09:29:42Z</ur:EndTime>
		<ur:Host>host.example.org</ur:Host>
	</ur:StorageUsageBlock>
	<ur:CloudUsageBlock>
		<ur:LocalVirtualMachineId>ab1234</ur:LocalVirtualMachineId>
		<ur:GlobalVirtualMachineId>
			host.example.org/ab1234/2018-05-09T09:06:52Z
		</ur:GlobalVirtualMachineId>
		<ur:Status>started</ur:Status>
		<ur:SuspendDuration>PT3600S</ur:SuspendDuration>
		<ur:ImageId>UbuntuImage2013</ur:ImageId>
		<ur:MachineName>cloud.example.org</ur:MachineName>
		<ur:SubmitHost>
			cloud-name=cloud.example.org,Mds-Vo-name=local,o=cloud
		</ur:SubmitHost>
		<ur:TimeInstant ur:type="Ctime">2018-05-31T10:30:00</ur:TimeInstant>
		<ur:TimeInstant ur:type="Qtime">2018-05-31T10:31:00</ur:TimeInstant>
		<ur:TimeInstant ur:type="Etime">2018-05-31T10:59:42</ur:TimeInstant>
		<ur:ServiceLevel>Premium</ur:ServiceLevel>
	</ur:CloudUsageBlock>
	<ur:NetworkUsageBlock>
		<ur:NetworkClass ur:NetworkResourceBandwidth="100000000">"Ethernet"</ur:NetworkClass>
		<ur:NetworkInboundUsed ur:SourceAddress=192.168.1.12>14728</ur:NetworkInboundUsed>
		<ur:NetworkOutboundUsed ur:DestinationAddress=192.168.1.21>14728</ur:NetworkOutboundUsed>
	</ur:NetworkUsageBlock>
</ur:UsageRecord>

Collect detailed usage statistics from compute environment

Pau Ruiz Safont

Level 3: Domain-specific workflow annotations

outputs:
  sequence:
    type: stdout
  format: edam:format_1929 # FASTA
  cwlprov:concept: ebi_metagenomics:assembly_statistics
  cwlprov:relationships:
      prov:wasDerivedFrom: [ inputs.second_input, outputs.second_output ] 
        {
            "uri": "urn:hash::sha1:b9214658cc453331b62c2282b772a5c063dbd284",
            "bundledAs": {
                "uri": "arcp://uuid,1f767ad4-ac52-4623-b5bc-dd9faf2b869f/data/b9/b9214658cc453331b62c2282b772a5c063dbd284",
                "folder": "/data/b9/",
                "filename": "b9214658cc453331b62c2282b772a5c063dbd284"
            },
            "format": "http://edamontology.org/format_1929",
            "classifying":  "ebi_metagenomics:assembly_statistics",
            "prov:wasDerivedFrom": [ 
               "urn:hash::sha1:f572d396fae9206628714fb2ce00f72e94f2258f",
               "urn:hash::sha1:2258ff572d396fae9206628714fb2ce00f72e94f",
            ]
        },

postprocessing of Research Object

outputs:
  sequence:
    type: stdout
  biocompute:error_domain: [ 
    "frequency_cutoff > 0.05"
  ],
  biocompute:input_domain: {
    "genomic_reference": "WAP_RAT"
  }

Deep Validation

{ manifest } 
{ prov } 
{ cwl } 

Adam Cowdy

CWLProv
without CWL..?

2018-10-29 CWLProv

By Stian Soiland-Reyes

2018-10-29 CWLProv

Presented at Workshop on Research Objects (RO2018) at IEEE eScience 2018, Amsterdam, Netherlands (29 October 2018).

  • 2,945