Stian Soiland-Reyes, Farah Zaib Khan, Richard O. Sinnott, Andrew Lonie, Michael R Crusoe, Carole Goble
Workshop for Research Objects (RO2018),
IEEE eScience 2008, Amsterdam
2018-10-29
This work is licensed under a
Creative Commons Attribution 4.0 International License.
This work has been done as part of the BioExcel CoE (www.bioexcel.eu), a project funded by the European Union contract H2020-EINFRA-2015-1-675728.
Capturing interoperable reproducible workflows with Common Workflow Language
Three broad categories
Cpipe
https://slides.com/farahzkhan/cwlprov
cwlVersion: v1.0
class: Workflow
inputs:
inp: File
ex: string
outputs:
classout:
type: File
outputSource: compile/classfile
steps:
untar:
run: tar-param.cwl
in:
tarfile: inp
extractfile: ex
out: [example_out]
compile:
run: arguments.cwl
in:
src: untar/example_out
out: [classfile]
#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: Workflow
inputs:
inp: File
ex: string
outputs:
classout:
type: File
outputSource: compile/classfile
steps:
untar:
run: tar-param.cwl
in:
tarfile: inp
extractfile: ex
out: [example_out]
compile:
run: arguments.cwl
in:
src: untar/example_out
out: [classfile]
Composing a workflow
cwlVersion: v1.0
class: Workflow
label: EMG QC workflow, (paired end version). Benchmarking with MG-RAST expt.
requirements:
- class: SubworkflowFeatureRequirement
- class: SchemaDefRequirement
types:
- $import: ../tools/FragGeneScan-model.yaml
- $import: ../tools/trimmomatic-sliding_window.yaml
- $import: ../tools/trimmomatic-end_mode.yaml
- $import: ../tools/trimmomatic-phred.yaml
inputs:
reads:
type: File
format: edam:format_1930 # FASTQ
outputs:
processed_sequences:
type: File
outputSource: clean_fasta_headers/sequences_with_cleaned_headers
steps:
trim_quality_control:
doc: |
Low quality trimming (low quality ends and sequences with < quality scores
less than 15 over a 4 nucleotide wide window are removed)
run: ../tools/trimmomatic.cwl
in:
reads1: reads
phred: { default: '33' }
leading: { default: 3 }
trailing: { default: 3 }
end_mode: { default: SE }
minlen: { default: 100 }
slidingwindow:
default:
windowSize: 4
requiredQuality: 15
out: [reads1_trimmed]
convert_trimmed-reads_to_fasta:
run: ../tools/fastq_to_fasta.cwl
in:
fastq: trim_quality_control/reads1_trimmed
out: [ fasta ]
clean_fasta_headers:
run: ../tools/clean_fasta_headers.cwl
in:
sequences: convert_trimmed-reads_to_fasta/fasta
out: [ sequences_with_cleaned_headers ]
$namespaces:
edam: http://edamontology.org/
s: http://schema.org/
$schemas:
- http://edamontology.org/EDAM_1.16.owl
- https://schema.org/docs/schema_org_rdfa.html
s:license: "https://www.apache.org/licenses/LICENSE-2.0"
s:copyrightHolder: "EMBL - European Bioinformatics Institute"
stain@biggie:~/src/ebi-metagenomics-cwl$ find . -name *cwl | xargs ls
./tools/5S-from-tablehits.cwl ./tools/minia.cwl
./tools/biom-convert.cwl ./tools/modify_taxonomy_table.cwl
./tools/biom-summarize_table.cwl ./tools/nhmmer.cwl
./tools/clean_fasta_headers.cwl ./tools/oneLineFasta.cwl
./tools/cmsearch-deoverlap.cwl ./tools/orf_stats.cwl
./tools/collate_unique_SSU_headers.cwl ./tools/prepend_header.cwl
./tools/concatenate.cwl ./tools/pull-5Ss.cwl
./tools/count_fasta.cwl ./tools/pull-LSUs.cwl
./tools/count_fastq.cwl ./tools/pull-SSUs.cwl
./tools/create_categorisations.cwl ./tools/qc-stats.cwl
./tools/discard_short_seqs.cwl ./tools/qiime-filter_tree.cwl
./tools/esl-reformat.cwl ./tools/qiime-pick_closed_reference_otus.cwl
./tools/esl-sfetch-index.cwl ./tools/RevReadSort.cwl
./tools/esl-sfetch-manyseqs.cwl ./tools/rRNA_selection.cwl
./tools/esl-sfetch-oneseq.cwl ./tools/seqprep.cwl
./tools/extract_coord_lines.cwl ./tools/seqprep-merge.cwl
./tools/extract-coords-from-cmsearch.cwl ./tools/SSU-from-tablehits.cwl
./tools/extract_observations.cwl ./tools/summary.cwl
./tools/extract_sig_coords.cwl ./tools/trimmomatic.cwl
./tools/faselector.cwl ./tools/tRNA_selection.cwl
./tools/fasta_chunker.cwl ./tools/update_krona_chart_urls.cwl
./tools/fastq_to_fasta.cwl ./tools/write_ipr_summary.cwl
./tools/FragGeneScan1_20.cwl ./workflows/16S_taxonomic_analysis.cwl
./tools/go_summary.cwl ./workflows/cmsearch-multimodel.cwl
./tools/hmmsearch.cwl ./workflows/convert-to-v3-layout.cwl
./tools/infernal-cmscan.cwl ./workflows/emg-assembly.cwl
./tools/infernal-cmsearch.cwl ./workflows/emg-core-analysis-v4.cwl
./tools/InterProScan5.21-60.cwl ./workflows/emg-pipeline-v3.cwl
./tools/ipr_stats.cwl ./workflows/emg-pipeline-v3-paired.cwl
./tools/krona.cwl ./workflows/emg-pipeline-v4-assembly-metaSPAdes.cwl
./tools/krona_setup.cwl ./workflows/emg-pipeline-v4-paired.cwl
./tools/LSU-from-tablehits.cwl ./workflows/emg-pipeline-v4-single.cwl
./tools/map_fa_headers.cwl ./workflows/emg-qc-paired.cwl
./tools/mapseq2biom.cwl ./workflows/emg-qc-single.cwl
./tools/mapseq.cwl ./workflows/functional_analysis.cwl
./tools/mask_RNA.cwl ./workflows/orf_prediction.cwl
./tools/megahit.cwl ./workflows/rna-selector.cwl
./tools/metaspades.cwl ./workflows/trim_and_reformat_reads.cwl
Provenance for Workflows?
Prospective provenance
The ‘recipes’ used to execute a computational task, e.g. the workflow specification or workflow template.
Retrospective provenance
The execution record of the running a workflow instance, details of every executed process and comprehensive info on execution environment
Workflow evolution
Changes across the workflow specification, its tool definitions and the software and services it depends on.
{
"@context" : [ "https://w3id.org/bundle/context" ],
"id" : "/",
"manifest" : [ "manifest.json" ],
"createdOn" : "2017-08-24T10:57:46.325Z",
"createdBy" : {
"uri" : "https://view.commonwl.org",
"name" : "Common Workflow Language Viewer"
},
"authoredBy" : [ {
"uri" : "mailto:peter.amstutz@curoverse.com",
"name" : "Peter Amstutz"
}, {
"uri" : "mailto:luka.stojanovic@sbgenomics.com",
"name" : "Luka Stojanovic"
}, {
"uri" : "mailto:crusoe@ucdavis.edu",
"name" : "Michael R. Crusoe"
}, {
"uri" : "mailto:porter@porter.st",
"name" : "Andrey Kartashov"
}, {
"uri" : "mailto:janko.simonovic@sbgenomics.com",
"name" : "Janko Simonovic"
} ],
"retrievedFrom" : "https://github.com/common-workflow-language/workflows/blob/lobstr-v1/workflows/lobSTR/",
"retrievedOn" : "2017-08-24T10:57:46.325Z",
"retrievedBy" : {
"uri" : "https://view.commonwl.org",
"name" : "Common Workflow Language Viewer"
},
"history" : [ "http:/git2prov.org/git2prov?giturl=https:/github.com/common-workflow-language/workflows.git&serialization=PROV-JSON" ],
"aggregates" : [ {
"uri" : "/workflow/tmp_2.fq",
"mediatype" : "application/octet-stream",
"createdOn" : "2017-08-24T10:57:46.923Z",
"authoredBy" : [ {
"uri" : "mailto:peter.amstutz@curoverse.com",
"name" : "Peter Amstutz"
} ],
"retrievedFrom" : "https://raw.githubusercontent.com/common-workflow-language/workflows/lobstr-v1/workflows/lobSTR/tmp_2.fq",
"retrievedBy" : {
"uri" : "https://view.commonwl.org",
"name" : "Common Workflow Language Viewer"
},
"bundledAs" : {
"uri" : "urn:uuid:61579f3e-63e6-49c2-b780-f67b2df461b7",
"folder" : "/workflow/"
}
}, {
"uri" : "/workflow/lobSTR-demo.json",
"mediatype" : "application/json",
"createdOn" : "2017-08-24T10:57:47.216Z",
"authoredBy" : [ {
"uri" : "mailto:peter.amstutz@curoverse.com",
"name" : "Peter Amstutz"
} ],
"retrievedFrom" : "https://raw.githubusercontent.com/common-workflow-language/workflows/lobstr-v1/workflows/lobSTR/lobSTR-demo.json",
"retrievedBy" : {
"uri" : "https://view.commonwl.org",
"name" : "Common Workflow Language Viewer"
},
"bundledAs" : {
"uri" : "urn:uuid:973caa0e-f3bd-45e8-8d29-70123bc8715a",
"folder" : "/workflow/"
}
}, {
"uri" : "/workflow/models/illumina_v3.pcrfree.stuttermodel",
"mediatype" : "application/octet-stream",
"createdOn" : "2017-08-24T10:57:47.239Z",
"authoredBy" : [ {
"uri" : "mailto:peter.amstutz@curoverse.com",
"name" : "Peter Amstutz"
} ],
"retrievedFrom" : "https://raw.githubusercontent.com/common-workflow-language/workflows/lobstr-v1/workflows/lobSTR/models/illumina_v3.pcrfree.stuttermodel",
"retrievedBy" : {
"uri" : "https://view.commonwl.org",
"name" : "Common Workflow Language Viewer"
},
"bundledAs" : {
"uri" : "urn:uuid:62bbcbea-f34f-463f-990d-6148f8ed5e5c",
"folder" : "/workflow/models/"
}
}, {
"uri" : "/workflow/models/illumina_v3.pcrfree.stepmodel",
"mediatype" : "application/octet-stream",
"createdOn" : "2017-08-24T10:57:47.266Z",
"authoredBy" : [ {
"uri" : "mailto:peter.amstutz@curoverse.com",
"name" : "Peter Amstutz"
} ],
"retrievedFrom" : "https://raw.githubusercontent.com/common-workflow-language/workflows/lobstr-v1/workflows/lobSTR/models/illumina_v3.pcrfree.stepmodel",
"retrievedBy" : {
"uri" : "https://view.commonwl.org",
"name" : "Common Workflow Language Viewer"
},
"bundledAs" : {
"uri" : "urn:uuid:03439ae7-cd94-42a3-b5fe-40bfff6882d8",
"folder" : "/workflow/models/"
}
}, {
"uri" : "/workflow/samtools-sort.cwl",
"mediatype" : "text/x-yaml",
"createdOn" : "2017-08-24T10:57:47.269Z",
"authoredBy" : [ {
"uri" : "mailto:luka.stojanovic@sbgenomics.com",
"name" : "Luka Stojanovic"
}, {
"uri" : "mailto:crusoe@ucdavis.edu",
"name" : "Michael R. Crusoe"
}, {
"uri" : "mailto:porter@porter.st",
"name" : "Andrey Kartashov"
}, {
"uri" : "mailto:peter.amstutz@curoverse.com",
"name" : "Peter Amstutz"
} ],
"retrievedFrom" : "https://raw.githubusercontent.com/common-workflow-language/workflows/lobstr-v1/workflows/lobSTR/samtools-sort.cwl",
"retrievedBy" : {
"uri" : "https://view.commonwl.org",
"name" : "Common Workflow Language Viewer"
},
"conformsTo" : "https://w3id.org/cwl/v1.0",
"bundledAs" : {
"uri" : "urn:uuid:2dc07859-efc2-4945-a95f-ba7815b68d07",
"folder" : "/workflow/"
}
}, {
"uri" : "/workflow/lobSTR-workflow.cwl",
"mediatype" : "text/x-yaml",
"createdOn" : "2017-08-24T10:57:47.42Z",
"authoredBy" : [ {
"uri" : "mailto:luka.stojanovic@sbgenomics.com",
"name" : "Luka Stojanovic"
}, {
"uri" : "mailto:crusoe@ucdavis.edu",
"name" : "Michael R. Crusoe"
}, {
"uri" : "mailto:peter.amstutz@curoverse.com",
"name" : "Peter Amstutz"
} ],
"retrievedFrom" : "https://raw.githubusercontent.com/common-workflow-language/workflows/lobstr-v1/workflows/lobSTR/lobSTR-workflow.cwl",
"retrievedBy" : {
"uri" : "https://view.commonwl.org",
"name" : "Common Workflow Language Viewer"
},
"conformsTo" : "https://w3id.org/cwl/v1.0",
"bundledAs" : {
"uri" : "urn:uuid:58bc1895-3460-46d6-91d7-fa1718d09631",
"folder" : "/workflow/"
}
}, {
"uri" : "/workflow/lobSTR-arvados-demo.json",
"mediatype" : "application/json",
"createdOn" : "2017-08-24T10:57:47.453Z",
"authoredBy" : [ {
"uri" : "mailto:peter.amstutz@curoverse.com",
"name" : "Peter Amstutz"
} ],
"retrievedFrom" : "https://raw.githubusercontent.com/common-workflow-language/workflows/lobstr-v1/workflows/lobSTR/lobSTR-arvados-demo.json",
"retrievedBy" : {
"uri" : "https://view.commonwl.org",
"name" : "Common Workflow Language Viewer"
},
"bundledAs" : {
"uri" : "urn:uuid:30c683bc-69fb-4d93-8dad-65b663783af5",
"folder" : "/workflow/"
}
}, {
"uri" : "/workflow/samtools-index.cwl",
"mediatype" : "text/x-yaml",
"createdOn" : "2017-08-24T10:57:47.458Z",
"authoredBy" : [ {
"uri" : "mailto:luka.stojanovic@sbgenomics.com",
"name" : "Luka Stojanovic"
}, {
"uri" : "mailto:crusoe@ucdavis.edu",
"name" : "Michael R. Crusoe"
}, {
"uri" : "mailto:porter@porter.st",
"name" : "Andrey Kartashov"
}, {
"uri" : "mailto:peter.amstutz@curoverse.com",
"name" : "Peter Amstutz"
} ],
"retrievedFrom" : "https://raw.githubusercontent.com/common-workflow-language/workflows/lobstr-v1/workflows/lobSTR/samtools-index.cwl",
"retrievedBy" : {
"uri" : "https://view.commonwl.org",
"name" : "Common Workflow Language Viewer"
},
"conformsTo" : "https://w3id.org/cwl/v1.0",
"bundledAs" : {
"uri" : "urn:uuid:8235d3f8-6927-4f73-b160-8521838a1cbb",
"folder" : "/workflow/"
}
}, {
"uri" : "/workflow/lobSTR-tool.cwl",
"mediatype" : "text/x-yaml",
"createdOn" : "2017-08-24T10:57:47.476Z",
"authoredBy" : [ {
"uri" : "mailto:luka.stojanovic@sbgenomics.com",
"name" : "Luka Stojanovic"
}, {
"uri" : "mailto:crusoe@ucdavis.edu",
"name" : "Michael R. Crusoe"
}, {
"uri" : "mailto:peter.amstutz@curoverse.com",
"name" : "Peter Amstutz"
} ],
"retrievedFrom" : "https://raw.githubusercontent.com/common-workflow-language/workflows/lobstr-v1/workflows/lobSTR/lobSTR-tool.cwl",
"retrievedBy" : {
"uri" : "https://view.commonwl.org",
"name" : "Common Workflow Language Viewer"
},
"conformsTo" : "https://w3id.org/cwl/v1.0",
"bundledAs" : {
"uri" : "urn:uuid:7fa6fbe4-1fc5-4cb5-9c1a-56b96c5f7aaf",
"folder" : "/workflow/"
}
}, {
"uri" : "/workflow/allelotype.cwl",
"mediatype" : "text/x-yaml",
"createdOn" : "2017-08-24T10:57:47.537Z",
"authoredBy" : [ {
"uri" : "mailto:luka.stojanovic@sbgenomics.com",
"name" : "Luka Stojanovic"
}, {
"uri" : "mailto:janko.simonovic@sbgenomics.com",
"name" : "Janko Simonovic"
}, {
"uri" : "mailto:crusoe@ucdavis.edu",
"name" : "Michael R. Crusoe"
}, {
"uri" : "mailto:peter.amstutz@curoverse.com",
"name" : "Peter Amstutz"
} ],
"retrievedFrom" : "https://raw.githubusercontent.com/common-workflow-language/workflows/lobstr-v1/workflows/lobSTR/allelotype.cwl",
"retrievedBy" : {
"uri" : "https://view.commonwl.org",
"name" : "Common Workflow Language Viewer"
},
"conformsTo" : "https://w3id.org/cwl/v1.0",
"bundledAs" : {
"uri" : "urn:uuid:3706bd2f-e53f-431d-b32a-deb661d9b292",
"folder" : "/workflow/"
}
}, {
"uri" : "/workflow/README",
"mediatype" : "application/octet-stream",
"createdOn" : "2017-08-24T10:57:47.555Z",
"authoredBy" : [ {
"uri" : "mailto:crusoe@ucdavis.edu",
"name" : "Michael R. Crusoe"
}, {
"uri" : "mailto:peter.amstutz@curoverse.com",
"name" : "Peter Amstutz"
} ],
"retrievedFrom" : "https://raw.githubusercontent.com/common-workflow-language/workflows/lobstr-v1/workflows/lobSTR/README",
"retrievedBy" : {
"uri" : "https://view.commonwl.org",
"name" : "Common Workflow Language Viewer"
},
"bundledAs" : {
"uri" : "urn:uuid:ed54c4d6-c585-4dc9-b7bc-0cf299e20b91",
"folder" : "/workflow/"
}
}, {
"uri" : "/workflow/tmp_1.fq",
"mediatype" : "application/octet-stream",
"createdOn" : "2017-08-24T10:57:47.738Z",
"authoredBy" : [ {
"uri" : "mailto:peter.amstutz@curoverse.com",
"name" : "Peter Amstutz"
} ],
"retrievedFrom" : "https://raw.githubusercontent.com/common-workflow-language/workflows/lobstr-v1/workflows/lobSTR/tmp_1.fq",
"retrievedBy" : {
"uri" : "https://view.commonwl.org",
"name" : "Common Workflow Language Viewer"
},
"bundledAs" : {
"uri" : "urn:uuid:5d431f81-ad0b-4acf-903a-9d5aa03b04df",
"folder" : "/workflow/"
}
}, {
"uri" : "/visualisation.png",
"mediatype" : "image/png",
"createdOn" : "2017-08-24T10:57:47.801Z",
"retrievedFrom" : "https://view.commonwl.org/graph/png/github.com/common-workflow-language/workflows/blob/lobstr-v1/workflows/lobSTR/lobSTR-workflow.cwl",
"bundledAs" : {
"uri" : "urn:uuid:ff9ace37-e76c-49f8-8d36-60f11ff6d257",
"folder" : "/"
}
}, {
"uri" : "/visualisation.svg",
"mediatype" : "image/svg+xml",
"createdOn" : "2017-08-24T10:57:47.821Z",
"retrievedFrom" : "https://view.commonwl.org/graph/svg/github.com/common-workflow-language/workflows/blob/lobstr-v1/workflows/lobSTR/lobSTR-workflow.cwl",
"bundledAs" : {
"uri" : "urn:uuid:a6cfb437-8818-4ab2-9081-efc74c5109e8",
"folder" : "/"
}
} ],
"annotations" : [ {
"uri" : "urn:uuid:9f602fff-b280-41c5-9590-ab95a49c85ad",
"about" : "/",
"content" : "annotations/merged.cwl"
}, {
"uri" : "urn:uuid:0ce4b727-ff61-4534-9afb-e3d676d2782d",
"about" : "/",
"content" : "annotations/workflow.ttl"
} ]
}
#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: Workflow
label: "Hello World"
doc: "Outputs a message using echo"
inputs: []
outputs:
response:
outputSource: step0/response
type: File
steps:
step0:
run:
class: CommandLineTool
inputs:
message:
type: string
doc: "The message to print"
default: "Hello World"
inputBinding:
position: 1
baseCommand: echo
stdout: response.txt
outputs:
response:
type: stdout
in: []
out: [response]
Permalink URI scheme
https://w3id.org/cwl/view/{scheme}/{commit}/{path}{?part=fragment}
- https://w3id.org/cwl/view/ fixed prefix at permalink service https://w3id.org/
- {scheme} - source code management protocol, currently only git supported:
- {commit} - full git commit sha1 id (no branches or short commits allowed)
- {path} - relative path to .cwl file within a checkout of that git commit
- {?part=fragment} - optional part within CWL file , e.g. #main
Any git permalinks are resolved using https://view.commonwl.org/git which - if it knows about that particular git commit - will content-negotiate to provide various representations.
Anyone can mint these permalinks for .cwl files for a given commit, in any public or private git repository, given no uncommitted files or git submodules.
Farah Zaib Khan et al,
CWLProv – Interoperable retrospective provenance capture and its challenges,
BOSC 2018
Provenance using PROV-Model
expanded with wfprov and wfdesc
Workflow specifications
Levels of Provenance
./revsort-run-1/bagit.txt
./revsort-run-1/bag-info.txt
./revsort-run-1/manifest-sha1.txt
./revsort-run-1/tagmanifest-sha1.txt
./revsort-run-1/data
./revsort-run-1/data/97/97fe1b50b4582cebc7d853796ebd62e3e163aa3f
./revsort-run-1/data/32/327fc7aedf4f6b69a42a7c8b808dc5a7aff61376
./revsort-run-1/data/b9/b9214658cc453331b62c2282b772a5c063dbd284
./revsort-run-1/workflow
./revsort-run-1/workflow/packed.cwl
./revsort-run-1/workflow/primary-output.json
./revsort-run-1/workflow/primary-job.json
./revsort-run-1/snapshot
./revsort-run-1/snapshot/revtool.cwl
./revsort-run-1/snapshot/revsort.cwl
./revsort-run-1/snapshot/sorttool.cwl
./revsort-run-1/metadata/logs/engine.ac9c1653-4291-47bc-86f8-6dedcff13519.txt
./revsort-run-1/metadata/manifest.jsonld
./revsort-run-1/metadata/provenance/primary.cwlprov.provn
./revsort-run-1/metadata/provenance/primary.cwlprov.json
./revsort-run-1/metadata/provenance/primary.cwlprov.ttl
./revsort-run-1/metadata/provenance/primary.cwlprov.jsonld
./revsort-run-1/metadata/provenance/primary.cwlprov.xml
./revsort-run-1/metadata/provenance/primary.cwlprov.nt
Completeness
Workflow values
Rerunnable
Reusable
Unstructured log
Relations and identifiers
Structured execution log
Profile
Level 0
Level 1
"Transport
level"
document
prefix wfprov <http://purl.org/wf4ever/wfprov#>
prefix prov <http://www.w3.org/ns/prov#>
prefix wfdesc <http://purl.org/wf4ever/wfdesc#>
prefix wf <https://w3id.org/cwl/view/git/933bf2a1a1cce32d88f88f136275535da9df0954/workflows/hello/hello.cwl#>
prefix input <app://579c1b74-b328-4da6-80a8-a2ffef2ac9b5/workflow/input.json#>
prefix run <urn:uuid:>
prefix engine <urn:uuid:>
prefix data <urn:hash:sha256:>
default <app://579c1b74-b328-4da6-80a8-a2ffef2ac9b5/>
// Level 1 provenance of workflow run
activity(run:2e1287e0-6dfb-11e7-8acf-0242ac110002, , , [prov:type='wfprov:WorkflowRun', prov:label="Run of workflow/packed.cwl#main"])
wasStartedBy(run:2e1287e0-6dfb-11e7-8acf-0242ac110002, -, -, -, 2017-10-27T14:24:00+01:00)
// The engine is the SoftwareAgent that is executing our Workflow plan
wasAssociatedWith(run:2e1287e0-6dfb-11e7-8acf-0242ac110002, engine:b2210211-8acb-4d58-bd28-2a36b18d3b4f, wf:main)
agent(engine:b2210211-8acb-4d58-bd28-2a36b18d3b4f, prov:type='prov:SoftwareAgent', prov:type='wfprov:WorkflowEngine', prov:label="cwltool v1.2.5")
// prov has no term to relate sub-plans - we'll use wfdesc:hasSubProcess
entity(wf:main,[prov:type='wfdesc:Workflow', prov:type='prov:Plan', wfdesc:hasSubProcess='wf:main/step1', wfdesc:hasSubProcess='wf:main/step2'])
alternateOf(wf:main, workflow/packed.cwl)
entity(wf:main/step1,[prov:type='wfdesc:Process', prov:type='prov:Plan'])
entity(wf:main/step2,[prov:type='wfdesc:Process', prov:type='prov:Plan'])
// First the workflow uses some data; here with a urn:sha:sha256 identifier
used(run:2e1287e0-6dfb-11e7-8acf-0242ac110002, data:5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03, 2017-10-27T14:29:00+01:00, [prov:role='wf:main/input1']))
entity(data:5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03, [prov:type='wfprov:Artifact'])
// which we have stored a copy of within the research object
specializationOf(data/58/5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03, data:5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03)
// Then there was another activity - wfprov:ProcessRun indicating a command line tool
activity(run:4305467e-6dfb-11e7-885d-0242ac110002, -, -, [prov:type='wfprov:ProcessRun', prov:label="Run of workflow/packed.cwl#main/step1"])
// started by the mother activity
wasStartedBy(run:4305467e-6dfb-11e7-885d-0242ac110002, -, -, run:2e1287e0-6dfb-11e7-8acf-0242ac110002, 2017-10-27T15:00:00+01:00)
// same engine using step1 as plan. In a distributed scenario there might be a different engine
wasAssociatedWith(run:4305467e-6dfb-11e7-885d-0242ac110002, engine:b2210211-8acb-4d58-bd28-2a36b18d3b4f, wf:main/step1)
// This activity also use the same data, but in a different role (e.g. input parameter)
used(run:4305467e-6dfb-11e7-885d-0242ac110002, data:5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03, 2017-10-27T14:00:00+01:00, [prov:role='wf:main/step1/in1'])
// And we generate some new data
wasGeneratedBy(data:00688350913f2f292943a274b57019d58889eda272370af261c84e78e204743c, run:4305467e-6dfb-11e7-885d-0242ac110002, 2017-10-27T16:00:00+01:00, [prov:role='wf:main/step1/out1']))
entity(data:00688350913f2f292943a274b57019d58889eda272370af261c84e78e204743c, [prov:type='wfprov:Artifact'])
// again stored in the RO
specializationOf(data/00/00688350913f2f292943a274b57019d58889eda272370af261c84e78e204743c, data:00688350913f2f292943a274b57019d58889eda272370af261c84e78e204743c)
// step1 finished
wasEndedBy(run:4305467e-6dfb-11e7-885d-0242ac110002, -, -, run:2e1287e0-6dfb-11e7-8acf-0242ac110002, 2017-10-27T15:30:00+01:00)
// the master workflow then "generate" that same value, but now at a different time and role (the resultA master workflow output)
wasGeneratedBy(data:00688350913f2f292943a274b57019d58889eda272370af261c84e78e204743c, run:2e1287e0-6dfb-11e7-8acf-0242ac110002, 2017-10-27T15:00:00+01:00, [prov:role='wf:main/resultA'])
// next step activity
activity(run:c42dc36e-6dfd-11e7-bc24-0242ac110002, -, - [prov:type='wfprov:ProcessRun', prov:label="Run of workflow/packed.cwl#main/step2"])
wasStartedBy(run:c42dc36e-6dfd-11e7-bc24-0242ac110002, -, -, run:2e1287e0-6dfb-11e7-8acf-0242ac110002, 2017-10-27T16:00:00+01:00)
// associated with step2
wasAssociatedWith(run:c42dc36e-6dfd-11e7-bc24-0242ac110002, engine:b2210211-8acb-4d58-bd28-2a36b18d3b4f, wf:main/step2)
// Uses two data artifacts; one which came from previous step, other as workflow input
used(run:4305467e-6dfb-11e7-885d-0242ac110002, data:5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03, 2017-10-27T15:00:00+01:00, [prov:role='wf:main/step2/valueA'])
used(run:4305467e-6dfb-11e7-885d-0242ac110002, data:00688350913f2f292943a274b57019d58889eda272370af261c84e78e204743c, 2017-10-27T15:00:00+01:00, [prov:role='wf:main/step2/valueB'])
// and generate two new data artifacts
wasGeneratedBy(data:952f537d1f3116db56703787ace248fe00ae46fa77ea3803aa3d8dc01d221a9d, run:c42dc36e-6dfd-11e7-bc24-0242ac110002, 2017-10-27T16:34:20+01:00, [prov:role='wf:main/step2/out1'])))
entity(data:952f537d1f3116db56703787ace248fe00ae46fa77ea3803aa3d8dc01d221a9d, [prov:type='wfprov:Artifact'])
specializationOf(data/95/2f537d1f3116db56703787ace248fe00ae46fa77ea3803aa3d8dc01d221a9d, data:952f537d1f3116db56703787ace248fe00ae46fa77ea3803aa3d8dc01d221a9d)
wasGeneratedBy(data:3deb00bd0decd1f21d015a178c4f23a5eb537588c08eeee9d55059ec29637be0, run:c42dc36e-6dfd-11e7-bc24-0242ac110002, 2017-10-27T16:34:20+01:00, [prov:role='wf:main/step2/out2'])))
entity(data:3deb00bd0decd1f21d015a178c4f23a5eb537588c08eeee9d55059ec29637be0, [prov:type='wfprov:Artifact'])
specializationOf(data/3d/eb00bd0decd1f21d015a178c4f23a5eb537588c08eeee9d55059ec29637be0, data:3deb00bd0decd1f21d015a178c4f23a5eb537588c08eeee9d55059ec29637be0)
// step2 ends
wasEndedBy(run:c42dc36e-6dfd-11e7-bc24-0242ac110002, -, -, run:2e1287e0-6dfb-11e7-8acf-0242ac110002, 2017-10-27T16:30:00+01:00)
// only step output out1 captured by mother workflow, sent to resultB workflow output
wasGeneratedBy(data:952f537d1f3116db56703787ace248fe00ae46fa77ea3803aa3d8dc01d221a9d, run:2e1287e0-6dfb-11e7-8acf-0242ac110002, 2017-10-27T15:00:00+01:00, [prov:role='wf:main/resultB'])
// mother workflow ends
wasEndedBy(run:2e1287e0-6dfb-11e7-8acf-0242ac110002, -, -, run:2e1287e0-6dfb-11e7-8acf-0242ac110002, 2017-10-27T16:34:40+01:00)
endDocument
CWLProv
Which PROV format?
<prov:wasGeneratedBy>
<prov:entity prov:ref="ex:ent1"/>
<prov:activity prov:ref="ex:act1"/>
<prov:time>2017-10-26T21:32:52Z</prov:time>
<ex:port>p1</ex:port>
</prov:wasGeneratedBy>
wasGeneratedBy(ent1, act1,
2017-10-26T21:32:52Z, ex:port="p1")
:ent1
a prov:Entity;
prov:wasGeneratedBy :act1;
prov:generatedAtTime "2017-10-26T21:32:52Z"^^xsd:dateTime ;
ex:port "p1" .
"wasGeneratedBy": {
"ex:gen1": {
"prov:entity": "ent1",
"prov:activity": "act1",
"prov:time": "2017-10-26T21:32:52Z",
"ex:port": "p1"
},
},
{ "@context": { .. },
"@id": "ent1",
"@type": "prov:Entity",
"ex:port": "p1",
"prov:generatedAtTime": "2017-10-26T21:32:52Z",
"prov:wasGeneratedBy": {
"@id": "act1",
"@type": "prov:Activity"
}
}
PROV-N
PROV-XML
PROV-JSON
PROV-O Turtle
PROV-O JSON-LD
Nested workflows
prefix id <urn:uuid:>
prefix provenance <arcp://uuid,73eab018-7b36-4f84-a845-aca8073bd46c/metadata/provenance/>
agent(id:a606d227-bf10-4479-8d11-823bb932bbac,
[prov:type='wfprov:WorkflowEngine', prov:type='prov:SoftwareAgent',
prov:label="cwltool 1.0.20180817162414"])
activity(id:73eab018-7b36-4f84-a845-aca8073bd46c, 2018-08-21T15:20:35.059920, -,
[prov:type='wfprov:WorkflowRun', prov:label="Run of workflow/packed.cwl#main"])
wasStartedBy(id:73eab018-7b36-4f84-a845-aca8073bd46c, -, id:a606d227-bf10-4479-8d11-823bb932bbac, 2018-08-21T15:20:35.060038)
activity(id:e79fc8dc-6e40-4236-b22c-41fee22947a9, -, -,
[prov:type='wfprov:ProcessRun', prov:label="Run of workflow/packed.cwl#main/compile"])
wasStartedBy(id:e79fc8dc-6e40-4236-b22c-41fee22947a9, -, id:73eab018-7b36-4f84-a845-aca8073bd46c, 2018-08-21T15:20:35.163189)
activity(id:e79fc8dc-6e40-4236-b22c-41fee22947a9, -, -,
[prov:has_provenance='provenance:workflow_compile.e79fc8dc-6e40-4236-b22c-41fee22947a9.cwlprov.provn',
prov:has_provenance='provenance:workflow_compile.e79fc8dc-6e40-4236-b22c-41fee22947a9.cwlprov.ttl'
])
{
"about": "urn:uuid:e79fc8dc-6e40-4236-b22c-41fee22947a9",
"content": [
"provenance/workflow_20compile.e79fc8dc-6e40-4236-b22c-41fee22947a9.cwlprov.provn",
"provenance/workflow_20compile.e79fc8dc-6e40-4236-b22c-41fee22947a9.cwlprov.ttl",
],
"oa:motivatedBy": {
"@id": "http://www.w3.org/ns/prov#has_provenance"
}
}
metadata/provenance/primary.cwlprov.provn
metadata/manifest.json
activity(id:e79fc8dc-6e40-4236-b22c-41fee22947a9, -, -,
[prov:type='wfprov:ProcessRun', prov:label="Run of workflow/packed.cwl#main/compile"])
wasStartedBy(id:e79fc8dc-6e40-4236-b22c-41fee22947a9, -, id:73eab018-7b36-4f84-a845-aca8073bd46c, 2018-08-21T15:20:35.163189)
activity(id:e79fc8dc-6e40-4236-b22c-41fee22947a9, -, -,
[prov:has_provenance='provenance:workflow_compile.e79fc8dc-6e40-4236-b22c-41fee22947a9.cwlprov.provn',
prov:has_provenance='provenance:workflow_compile.e79fc8dc-6e40-4236-b22c-41fee22947a9.cwlprov.ttl'
])
prefix id <urn:uuid:>
agent(id:a606d227-bf10-4479-8d11-823bb932bbac,
[prov:type='wfprov:WorkflowEngine', prov:type='prov:SoftwareAgent',
prov:label="cwltool 1.0.20180817162414"])
activity(id:e79fc8dc-6e40-4236-b22c-41fee22947a9, 2018-08-21T15:20:35.089187, -,
[prov:type='wfprov:WorkflowRun', prov:label="Run of workflow/packed.cwl#compile.cwl"])
wasStartedBy(id:e79fc8dc-6e40-4236-b22c-41fee22947a9, -, id:a606d227-bf10-4479-8d11-823bb932bbac, 2018-08-21T15:20:35.089303)
activity(id:9b1a2b69-3403-4063-9ff0-e7ac2df32036, -, -,
[prov:type='wfprov:ProcessRun', prov:label="Run of workflow/packed.cwl#compile.cwl/step1"])
wasStartedBy(id:9b1a2b69-3403-4063-9ff0-e7ac2df32036, -, id:e79fc8dc-6e40-4236-b22c-41fee22947a9, 2018-08-21T15:20:35.163189)
activity(id:9b1a2b69-3403-4063-9ff0-e7ac2df32036, -, -,
[prov:type='wfprov:ProcessRun', prov:label="Run of workflow/packed.cwl#compile.cwl/step2"])
wasStartedBy(id:9b1a2b69-3403-4063-9ff0-e7ac2df32036, -, id:e79fc8dc-6e40-4236-b22c-41fee22947a9, 2018-08-21T15:20:35.163189)
metadata/provenance/
primary.cwlprov.provn
metadata/provenance/
workflow_compile.e79fc8dc-6e40-4236-b22c-41fee22947a9.cwlprov.provn
Level 1
Level 2
Identifying intermediate data
Output 1B file is also Input 2C and Input 3D downstream
Simple filenames -> duplications
./data/step1/outputB.txt
./data/step2/inputC.txt
./data/step3/inputD.txt
Content-adressable
SHA-256 hash of bytes as filename:
./data/51/51fb8af0c4ae0422fbe88340d91880ecb9d7537cf57339c1cf1256b7ca58f32d
RFC6920 URI as global identifier:
nih:sha-256;51fb8af0c4ae0422fbe88340d91880ecb9d7537cf57339c1cf1256b7ca58f32d
{
"@context": [
{
"@base": "arcp://uuid,1f767ad4-ac52-4623-b5bc-dd9faf2b869f/metadata/"
},
"https://w3id.org/bundle/context"
],
"id": "/",
"conformsTo": "https://w3id.org/cwl/prov/0.6.0",
"manifest": "manifest.json",
"createdOn": "2018-10-25T15:46:43.191346",
"createdBy": {
"uri": "urn:uuid:ac9c1653-4291-47bc-86f8-6dedcff13519",
"name": "cwltool 1.0.20181012180214"
},
"authoredBy": {
"orcid": "https://orcid.org/0000-0001-9842-9718",
"name": "Stian Soiland-Reyes"
},
"aggregates": [
{
"uri": "urn:hash::sha1:327fc7aedf4f6b69a42a7c8b808dc5a7aff61376",
"bundledAs": {
"uri": "arcp://uuid,1f767ad4-ac52-4623-b5bc-dd9faf2b869f/data/32/327fc7aedf4f6b69a42a7c8b808dc5a7aff61376",
"folder": "/data/32/",
"filename": "327fc7aedf4f6b69a42a7c8b808dc5a7aff61376"
}
},
{
"uri": "urn:hash::sha1:97fe1b50b4582cebc7d853796ebd62e3e163aa3f",
"bundledAs": {
"uri": "arcp://uuid,1f767ad4-ac52-4623-b5bc-dd9faf2b869f/data/97/97fe1b50b4582cebc7d853796ebd62e3e163aa3f",
"folder": "/data/97/",
"filename": "97fe1b50b4582cebc7d853796ebd62e3e163aa3f"
}
},
{
"uri": "urn:hash::sha1:b9214658cc453331b62c2282b772a5c063dbd284",
"bundledAs": {
"uri": "arcp://uuid,1f767ad4-ac52-4623-b5bc-dd9faf2b869f/data/b9/b9214658cc453331b62c2282b772a5c063dbd284",
"folder": "/data/b9/",
"filename": "b9214658cc453331b62c2282b772a5c063dbd284"
}
},
{
"uri": "provenance/primary.cwlprov.xml",
"mediatype": "application/xml",
"conformsTo": [
"http://www.w3.org/TR/2013/NOTE-prov-xml-20130430/",
"https://w3id.org/cwl/prov/0.6.0"
],
"createdOn": "2018-10-25T15:46:43.191481",
"createdBy": {
"uri": "urn:uuid:ac9c1653-4291-47bc-86f8-6dedcff13519",
"name": "cwltool 1.0.20181012180214"
}
},
{
"uri": "../snapshot/revtool.cwl",
"mediatype": "text/x+yaml; charset=\"UTF-8\"",
"conformsTo": "https://w3id.org/cwl/",
"createdOn": "2018-06-05T15:19:48.781496"
},
{
"uri": "../snapshot/empty.ttl",
"mediatype": "text/turtle; charset=\"UTF-8\"",
"createdOn": "2018-04-04T13:29:55.717707"
},
{
"uri": "provenance/primary.cwlprov.json",
"mediatype": "application/json",
"conformsTo": [
"http://www.w3.org/Submission/2013/SUBM-prov-json-20130424/",
"https://w3id.org/cwl/prov/0.6.0"
],
"createdOn": "2018-10-25T15:46:43.191686",
"createdBy": {
"uri": "urn:uuid:ac9c1653-4291-47bc-86f8-6dedcff13519",
"name": "cwltool 1.0.20181012180214"
}
},
{
"uri": "../workflow/packed.cwl",
"mediatype": "text/x+yaml; charset=\"UTF-8\"",
"conformsTo": "https://w3id.org/cwl/",
"createdOn": "2018-10-25T15:46:43.191761",
"createdBy": {
"uri": "urn:uuid:ac9c1653-4291-47bc-86f8-6dedcff13519",
"name": "cwltool 1.0.20181012180214"
}
},
{
"uri": "provenance/primary.cwlprov.provn",
"mediatype": "text/provenance-notation; charset=\"UTF-8\"",
"conformsTo": [
"http://www.w3.org/TR/2013/REC-prov-n-20130430/",
"https://w3id.org/cwl/prov/0.6.0"
],
"createdOn": "2018-10-25T15:46:43.191825",
"createdBy": {
"uri": "urn:uuid:ac9c1653-4291-47bc-86f8-6dedcff13519",
"name": "cwltool 1.0.20181012180214"
}
},
{
"uri": "../snapshot/revsort.cwl",
"mediatype": "text/x+yaml; charset=\"UTF-8\"",
"conformsTo": "https://w3id.org/cwl/",
"createdOn": "2018-10-25T15:40:08.769943"
},
{
"uri": "../workflow/primary-output.json",
"mediatype": "application/json",
"createdOn": "2018-10-25T15:46:43.191944",
"createdBy": {
"uri": "urn:uuid:ac9c1653-4291-47bc-86f8-6dedcff13519",
"name": "cwltool 1.0.20181012180214"
}
},
{
"uri": "provenance/primary.cwlprov.ttl",
"mediatype": "text/turtle; charset=\"UTF-8\"",
"conformsTo": [
"http://www.w3.org/TR/2013/REC-prov-o-20130430/",
"https://w3id.org/cwl/prov/0.6.0"
],
"createdOn": "2018-10-25T15:46:43.192006",
"createdBy": {
"uri": "urn:uuid:ac9c1653-4291-47bc-86f8-6dedcff13519",
"name": "cwltool 1.0.20181012180214"
}
},
{
"uri": "provenance/primary.cwlprov.jsonld",
"mediatype": "application/ld+json",
"conformsTo": [
"http://www.w3.org/TR/2013/REC-prov-o-20130430/",
"https://w3id.org/cwl/prov/0.6.0"
],
"createdOn": "2018-10-25T15:46:43.192069",
"createdBy": {
"uri": "urn:uuid:ac9c1653-4291-47bc-86f8-6dedcff13519",
"name": "cwltool 1.0.20181012180214"
}
},
{
"uri": "../snapshot/sorttool.cwl",
"mediatype": "text/x+yaml; charset=\"UTF-8\"",
"conformsTo": "https://w3id.org/cwl/",
"createdOn": "2018-06-05T15:19:48.785496"
},
{
"uri": "provenance/primary.cwlprov.nt",
"mediatype": "application/n-triples",
"conformsTo": [
"http://www.w3.org/TR/2013/REC-prov-o-20130430/",
"https://w3id.org/cwl/prov/0.6.0"
],
"createdOn": "2018-10-25T15:46:43.192188",
"createdBy": {
"uri": "urn:uuid:ac9c1653-4291-47bc-86f8-6dedcff13519",
"name": "cwltool 1.0.20181012180214"
}
},
{
"uri": "logs/engine.ac9c1653-4291-47bc-86f8-6dedcff13519.txt",
"mediatype": "text/plain; charset=\"UTF-8\"",
"createdOn": "2018-10-25T15:46:43.192249",
"createdBy": {
"uri": "urn:uuid:ac9c1653-4291-47bc-86f8-6dedcff13519",
"name": "cwltool 1.0.20181012180214"
}
},
{
"uri": "../workflow/primary-job.json",
"mediatype": "application/json",
"createdOn": "2018-10-25T15:46:43.192312",
"createdBy": {
"uri": "urn:uuid:ac9c1653-4291-47bc-86f8-6dedcff13519",
"name": "cwltool 1.0.20181012180214"
}
},
{
"createdOn": "2018-10-25T15:46:35.303633",
"createdBy": {
"uri": "urn:uuid:ac9c1653-4291-47bc-86f8-6dedcff13519",
"name": "cwltool 1.0.20181012180214"
},
"uri": "urn:uuid:ed8d007b-a1f3-4bfe-b390-08df074d712d"
},
{
"createdOn": "2018-10-25T15:46:37.067848",
"createdBy": {
"uri": "urn:uuid:ac9c1653-4291-47bc-86f8-6dedcff13519",
"name": "cwltool 1.0.20181012180214"
},
"uri": "urn:uuid:4ab5a3fe-e481-4f7f-98c4-af8e5dfccb93"
}
],
"annotations": [
{
"uri": "urn:uuid:42a4ade6-245b-4746-acf9-e9910780a449",
"about": "urn:uuid:1f767ad4-ac52-4623-b5bc-dd9faf2b869f",
"content": "/",
"oa:motivatedBy": {
"@id": "oa:describing"
}
},
{
"uri": "urn:uuid:9ed6545d-dfb2-4cab-b4af-102735a2661e",
"about": "urn:uuid:1f767ad4-ac52-4623-b5bc-dd9faf2b869f",
"content": [
"provenance/primary.cwlprov.xml",
"provenance/primary.cwlprov.json",
"provenance/primary.cwlprov.provn",
"provenance/primary.cwlprov.ttl",
"provenance/primary.cwlprov.jsonld",
"provenance/primary.cwlprov.nt"
],
"oa:motivatedBy": {
"@id": "http://www.w3.org/ns/prov#has_provenance"
}
},
{
"uri": "urn:uuid:1c23181c-905c-49aa-a5e3-7194f9a43c29",
"about": "../workflow/packed.cwl",
"oa:motivatedBy": {
"@id": "oa:highlighting"
}
},
{
"uri": "urn:uuid:4f4132a7-c27d-47d5-a96f-3ad6ca741fe8",
"about": "urn:uuid:1f767ad4-ac52-4623-b5bc-dd9faf2b869f",
"content": [
"../workflow/packed.cwl",
"../workflow/primary-job.json"
],
"oa:motivatedBy": {
"@id": "oa:linking"
}
},
{
"uri": "urn:uuid:6cf23e43-cbd1-4b77-af65-d9a32dc913dc",
"about": "urn:uuid:ac9c1653-4291-47bc-86f8-6dedcff13519",
"content": [
"metadata/logs/engine.ac9c1653-4291-47bc-86f8-6dedcff13519.txt"
],
"oa:motivatedBy": {
"@id": "https://w3id.org/cwl/prov#log"
}
}
]
}
metadata/manifest.json
Consuming CWLProv ROs
$ cwlprov --help
usage: cwlprov [-h] [--version] [--directory DIRECTORY] [--relative]
[--absolute] [--output OUTPUT] [--verbose] [--quiet] [--hints]
[--no-hints]
{validate,info,who,prov,inputs,outputs,run,runs,rerun,derived,runtimes}
...
cwlprov explores Research Objects containing provenance of Common Workflow
Language executions. <https://w3id.org/cwl/prov/>
commands:
{validate,info,who,prov,inputs,outputs,run,runs,rerun,derived,runtimes}
validate Validate the CWLProv Research Object
info show research object Metadata
who show Who ran the workflow
prov export workflow execution Provenance in PROV format
inputs list workflow/step Input files/values
outputs list workflow/step Output files/values
run show workflow Execution log
runs List all workflow executions in RO
rerun Rerun a workflow or step
derived list what was Derived from a data item, based on
activity usage/generation
runtimes calculate average step execution Runtimes
stain@biggie:~/src/cwlprov-py/test/revsort-cwlprov-0.4.0$ cwlprov --verbose validate
INFO:cwlprov.tool:Detected /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/bagit.txt
INFO:cwlprov.tool:Opening BagIt /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/data/32/327fc7aedf4f6b69a42a7c8b808dc5a7aff61376
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/data/97/97fe1b50b4582cebc7d853796ebd62e3e163aa3f
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/data/b9/b9214658cc453331b62c2282b772a5c063dbd284
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/workflow/packed.cwl
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/workflow/primary-job.json
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/metadata/provenance/primary.cwlprov.xml
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/metadata/provenance/primary.cwlprov.provn
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/metadata/provenance/primary.cwlprov.json
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/metadata/provenance/primary.cwlprov.ttl
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/metadata/provenance/primary.cwlprov.nt
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/metadata/provenance/primary.cwlprov.jsonld
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/snapshot/revsort.cwl
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/snapshot/hello.txt
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/snapshot/revtool.cwl
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/snapshot/empty.ttl
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/snapshot/sorttool.cwl
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/metadata/manifest.json
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/bag-info.txt
INFO:cwlprov.ro:Parsing RO manifest /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/metadata/manifest.json
Valid CWLProv RO: .
Validating
(venv3) stain@biggie:~/src/cwlprov-py/test/nested-cwlprov-0.3.0$ cwlprov run
2018-08-08 22:44:06.573330 Flow 39408a40-c1c8-4852-9747-87249425be1e [ Run of workflow/packed.cwl#main
2018-08-08 22:44:06.691722 Step 4f082fb6-3e4d-4a21-82e3-c685ce3deb58 Run of workflow/packed.cwl#main/create-tar (0:00:00.010133)
2018-08-08 22:44:06.702976 Step 0cceeaf6-4109-4f08-940b-f06ac959944a * Run of workflow/packed.cwl#main/compile (unknown duration)
2018-08-08 22:44:12.680097 Flow 39408a40-c1c8-4852-9747-87249425be1e ] Run of workflow/packed.cwl#main (0:00:06.106767)
Legend:
[ Workflow start
* Nested provenance, use UUID to explore: cwlprov run 0cceeaf6-4109-4f08-940b-f06ac959944a
] Workflow end
(venv3) stain@biggie:~/src/cwlprov-py/test/nested-cwlprov-0.3.0$ cwlprov run 0cceeaf6-4109-4f08-940b-f06ac959944a
2018-08-08 22:44:06.607210 Flow 0cceeaf6-4109-4f08-940b-f06ac959944a [ Run of workflow/packed.cwl#main
2018-08-08 22:44:06.707070 Step 83752ab4-8227-4d4a-8baa-78376df34aed Run of workflow/packed.cwl#main/untar (0:00:00.008149)
2018-08-08 22:44:06.718554 Step f56d8478-a190-4251-84d9-7f69fe0f6f8b Run of workflow/packed.cwl#main/argument (0:00:00.532052)
2018-08-08 22:44:07.251588 Flow 0cceeaf6-4109-4f08-940b-f06ac959944a ] Run of workflow/packed.cwl#main (0:00:00.644378)
Legend:
[ Workflow start
] Workflow end
stain@biggie:~/src/cwlprov-py/test/nested-cwlprov-0.3.0$ cwlprov outputs 4f082fb6-3e4d-4a21-82e3-c685ce3deb58 --format=files
Output tar:
data/c0/c0fd5812fe6d8d91fef7f4f1ba3a462500fce0c5
stain@biggie:~/src/cwlprov-py/test/nested-cwlprov-0.3.0$ tar tfv `cwlprov -q outputs 4f082fb6-3e4d-4a21-82e3-c685ce3deb58 --format=files`
-rw-r--r-- stain/stain 115 2018-08-08 23:44 Hello.java
Inspecting step runs
TODO: Best practice for publishing CWLProv ROs
>1 GB -> "Big Data"
stain@biggie:~/dropbox/work/cwlprov/newest-swift$ du -hs *
13G alignment_0.6.0_linux
2.9G rnaseqwf_0.5.0_mac
80M somaticwf_0.5.0_ma
What next for CWLProv?
Adding CWLProv support to Toil
<?xml version="1.0" encoding="UTF-8"?>
<ur:UsageRecord xmlns="http://schema.ogf.org/urf/2013/04/urf"
xmlns:ur="http://schema.ogf.org/urf/2013/04/urf" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://schema.ogf.org/urf/2013/04/urf">
<ur:RecordIdentityBlock>
<ur:RecordId>urn:uuid:4350d583-61a5-45e8-a229-957aa81e8014</ur:RecordId>
<ur:CreateTime>2018-05-09T09:06:52Z</ur:CreateTime>
<ur:Site>EMBL-EBI</ur:Site>
<ur:Infrastructure>Embassy</ur:Infrastructure>
</ur:RecordIdentityBlock>
<ur:SubjectIdentityBlock>
<ur:LocalUserId>stain</ur:LocalUserId>
<ur:LocalGroupId>ELIXIRCWLImplStudy</ur:LocalGroupId>
<ur:GlobalUserId>https://orcid.org/0000-0001-9842-9718</ur:GlobalUserId>
</ur:SubjectIdentityBlock>
<ur:ComputeUsageBlock>
<ur:CpuDuration>PT3600S</ur:CpuDuration>
<ur:WallDuration>PT3600S</ur:WallDuration>
<ur:StartTime>2018-05-31T11:00:00</ur:StartTime>
<ur:EndTime>2018-05-31T12:00:00</ur:EndTime>
<ur:ExecutionHost>
<ur:Hostname>compute-0-1.example.com</ur:Hostname>
<ur:ProcessId>1042</ur:ProcessId>
<ur:Benchmark ur:type="si2k">3.14</ur:Benchmark>
</ur:ExecutionHost>
<ur:Processors>4</ur:Processors>
<ur:NodeCount>1</ur:NodeCount>
</ur:ComputeUsageBlock>
<ur:JobUsageBlock>
<ur:GlobalJobId>host.example.org/ab1234</ur:GlobalJobId>
<ur:LocalJobId>ab1234</ur:LocalJobId>
<ur:JobName>MetaGenomics1337</ur:JobName>
<ur:Queue ur:description="execution">"Bigmem"</ur:Queue>
<ur:TimeInstant ur:type="Ctime">2018-05-31T10:30:00</ur:TimeInstant>
<ur:TimeInstant ur:type="Qtime">2018-05-31T10:31:00</ur:TimeInstant>
<ur:TimeInstant ur:type="Etime">2018-05-31T10:59:42</ur:TimeInstant>
</ur:JobUsageBlock>
<ur:MemoryUsageBlock>
<ur:MemoryClass>"RAM"</ur:MemoryClass>
<ur:MemoryResourceCapacityUsed>14728</ur:MemoryResourceCapacityUsed>
<ur:MemoryResourceCapacityAllocated>56437</ur:MemoryResourceCapacityAllocated>
<ur:MemoryResourceCapacityRequested>42000</ur:MemoryResourceCapacityRequested>
<ur:StartTime>2018-05-31T11:00:00</ur:StartTime>
<ur:EndTime>2018-05-31T12:00:00</ur:EndTime>
</ur:MemoryUsageBlock>
<ur:StorageUsageBlock>
<ur:StorageShare>pool-003</ur:StorageShare>
<ur:StorageMedia>disk</ur:StorageMedia>
<ur:StorageClass>replicated</ur:StorageClass>
<ur:DirectoryPath>/projectA</ur:DirectoryPath>
<ur:FileCount>42</ur:FileCount>
<ur:StorageResourceCapacityUsed>14728</ur:StorageResourceCapacityUsed>
<ur:StorageLogicalCapacityUsed>13617</ur:StorageLogicalCapacityUsed>
<ur:StorageResourceCapacityAllocated>14624
</ur:StorageResourceCapacityAllocated>
<ur:StartTime>2018-05-07T09:31:40Z</ur:StartTime>
<ur:EndTime>2018-05-08T09:29:42Z</ur:EndTime>
<ur:Host>host.example.org</ur:Host>
</ur:StorageUsageBlock>
<ur:CloudUsageBlock>
<ur:LocalVirtualMachineId>ab1234</ur:LocalVirtualMachineId>
<ur:GlobalVirtualMachineId>
host.example.org/ab1234/2018-05-09T09:06:52Z
</ur:GlobalVirtualMachineId>
<ur:Status>started</ur:Status>
<ur:SuspendDuration>PT3600S</ur:SuspendDuration>
<ur:ImageId>UbuntuImage2013</ur:ImageId>
<ur:MachineName>cloud.example.org</ur:MachineName>
<ur:SubmitHost>
cloud-name=cloud.example.org,Mds-Vo-name=local,o=cloud
</ur:SubmitHost>
<ur:TimeInstant ur:type="Ctime">2018-05-31T10:30:00</ur:TimeInstant>
<ur:TimeInstant ur:type="Qtime">2018-05-31T10:31:00</ur:TimeInstant>
<ur:TimeInstant ur:type="Etime">2018-05-31T10:59:42</ur:TimeInstant>
<ur:ServiceLevel>Premium</ur:ServiceLevel>
</ur:CloudUsageBlock>
<ur:NetworkUsageBlock>
<ur:NetworkClass ur:NetworkResourceBandwidth="100000000">"Ethernet"</ur:NetworkClass>
<ur:NetworkInboundUsed ur:SourceAddress=192.168.1.12>14728</ur:NetworkInboundUsed>
<ur:NetworkOutboundUsed ur:DestinationAddress=192.168.1.21>14728</ur:NetworkOutboundUsed>
</ur:NetworkUsageBlock>
</ur:UsageRecord>
Collect detailed usage statistics from compute environment
Pau Ruiz Safont
Level 3: Domain-specific workflow annotations
outputs:
sequence:
type: stdout
format: edam:format_1929 # FASTA
cwlprov:concept: ebi_metagenomics:assembly_statistics
cwlprov:relationships:
prov:wasDerivedFrom: [ inputs.second_input, outputs.second_output ]
{
"uri": "urn:hash::sha1:b9214658cc453331b62c2282b772a5c063dbd284",
"bundledAs": {
"uri": "arcp://uuid,1f767ad4-ac52-4623-b5bc-dd9faf2b869f/data/b9/b9214658cc453331b62c2282b772a5c063dbd284",
"folder": "/data/b9/",
"filename": "b9214658cc453331b62c2282b772a5c063dbd284"
},
"format": "http://edamontology.org/format_1929",
"classifying": "ebi_metagenomics:assembly_statistics",
"prov:wasDerivedFrom": [
"urn:hash::sha1:f572d396fae9206628714fb2ce00f72e94f2258f",
"urn:hash::sha1:2258ff572d396fae9206628714fb2ce00f72e94f",
]
},
postprocessing of Research Object
outputs:
sequence:
type: stdout
biocompute:error_domain: [
"frequency_cutoff > 0.05"
],
biocompute:input_domain: {
"genomic_reference": "WAP_RAT"
}
Deep Validation
{ manifest }
{ prov }
{ cwl }
Adam Cowdy
CWLProv
without CWL..?
2018-10-29 CWLProv
By Farah Z Khan
2018-10-29 CWLProv
Presented at Workshop on Research Objects (RO2018) at IEEE eScience 2018, Amsterdam, Netherlands (29 October 2018).
- 948