Stian Soiland-Reyes, Farah Zaib Khan, Richard O. Sinnott, Andrew Lonie, Michael R Crusoe, Carole Goble
Workshop for Research Objects (RO2018),
IEEE eScience 2008, Amsterdam
2018-10-29
This work is licensed under a
Creative Commons Attribution 4.0 International License.
This work has been done as part of the BioExcel CoE (www.bioexcel.eu), a project funded by the European Union contract H2020-EINFRA-2015-1-675728.
Cpipe
https://slides.com/farahzkhan/cwlprov
cwlVersion: v1.0
class: Workflow
inputs:
inp: File
ex: string
outputs:
classout:
type: File
outputSource: compile/classfile
steps:
untar:
run: tar-param.cwl
in:
tarfile: inp
extractfile: ex
out: [example_out]
compile:
run: arguments.cwl
in:
src: untar/example_out
out: [classfile]
#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: Workflow
inputs:
inp: File
ex: string
outputs:
classout:
type: File
outputSource: compile/classfile
steps:
untar:
run: tar-param.cwl
in:
tarfile: inp
extractfile: ex
out: [example_out]
compile:
run: arguments.cwl
in:
src: untar/example_out
out: [classfile]
cwlVersion: v1.0
class: Workflow
label: EMG QC workflow, (paired end version). Benchmarking with MG-RAST expt.
requirements:
- class: SubworkflowFeatureRequirement
- class: SchemaDefRequirement
types:
- $import: ../tools/FragGeneScan-model.yaml
- $import: ../tools/trimmomatic-sliding_window.yaml
- $import: ../tools/trimmomatic-end_mode.yaml
- $import: ../tools/trimmomatic-phred.yaml
inputs:
reads:
type: File
format: edam:format_1930 # FASTQ
outputs:
processed_sequences:
type: File
outputSource: clean_fasta_headers/sequences_with_cleaned_headers
steps:
trim_quality_control:
doc: |
Low quality trimming (low quality ends and sequences with < quality scores
less than 15 over a 4 nucleotide wide window are removed)
run: ../tools/trimmomatic.cwl
in:
reads1: reads
phred: { default: '33' }
leading: { default: 3 }
trailing: { default: 3 }
end_mode: { default: SE }
minlen: { default: 100 }
slidingwindow:
default:
windowSize: 4
requiredQuality: 15
out: [reads1_trimmed]
convert_trimmed-reads_to_fasta:
run: ../tools/fastq_to_fasta.cwl
in:
fastq: trim_quality_control/reads1_trimmed
out: [ fasta ]
clean_fasta_headers:
run: ../tools/clean_fasta_headers.cwl
in:
sequences: convert_trimmed-reads_to_fasta/fasta
out: [ sequences_with_cleaned_headers ]
$namespaces:
edam: http://edamontology.org/
s: http://schema.org/
$schemas:
- http://edamontology.org/EDAM_1.16.owl
- https://schema.org/docs/schema_org_rdfa.html
s:license: "https://www.apache.org/licenses/LICENSE-2.0"
s:copyrightHolder: "EMBL - European Bioinformatics Institute"
stain@biggie:~/src/ebi-metagenomics-cwl$ find . -name *cwl | xargs ls
./tools/5S-from-tablehits.cwl ./tools/minia.cwl
./tools/biom-convert.cwl ./tools/modify_taxonomy_table.cwl
./tools/biom-summarize_table.cwl ./tools/nhmmer.cwl
./tools/clean_fasta_headers.cwl ./tools/oneLineFasta.cwl
./tools/cmsearch-deoverlap.cwl ./tools/orf_stats.cwl
./tools/collate_unique_SSU_headers.cwl ./tools/prepend_header.cwl
./tools/concatenate.cwl ./tools/pull-5Ss.cwl
./tools/count_fasta.cwl ./tools/pull-LSUs.cwl
./tools/count_fastq.cwl ./tools/pull-SSUs.cwl
./tools/create_categorisations.cwl ./tools/qc-stats.cwl
./tools/discard_short_seqs.cwl ./tools/qiime-filter_tree.cwl
./tools/esl-reformat.cwl ./tools/qiime-pick_closed_reference_otus.cwl
./tools/esl-sfetch-index.cwl ./tools/RevReadSort.cwl
./tools/esl-sfetch-manyseqs.cwl ./tools/rRNA_selection.cwl
./tools/esl-sfetch-oneseq.cwl ./tools/seqprep.cwl
./tools/extract_coord_lines.cwl ./tools/seqprep-merge.cwl
./tools/extract-coords-from-cmsearch.cwl ./tools/SSU-from-tablehits.cwl
./tools/extract_observations.cwl ./tools/summary.cwl
./tools/extract_sig_coords.cwl ./tools/trimmomatic.cwl
./tools/faselector.cwl ./tools/tRNA_selection.cwl
./tools/fasta_chunker.cwl ./tools/update_krona_chart_urls.cwl
./tools/fastq_to_fasta.cwl ./tools/write_ipr_summary.cwl
./tools/FragGeneScan1_20.cwl ./workflows/16S_taxonomic_analysis.cwl
./tools/go_summary.cwl ./workflows/cmsearch-multimodel.cwl
./tools/hmmsearch.cwl ./workflows/convert-to-v3-layout.cwl
./tools/infernal-cmscan.cwl ./workflows/emg-assembly.cwl
./tools/infernal-cmsearch.cwl ./workflows/emg-core-analysis-v4.cwl
./tools/InterProScan5.21-60.cwl ./workflows/emg-pipeline-v3.cwl
./tools/ipr_stats.cwl ./workflows/emg-pipeline-v3-paired.cwl
./tools/krona.cwl ./workflows/emg-pipeline-v4-assembly-metaSPAdes.cwl
./tools/krona_setup.cwl ./workflows/emg-pipeline-v4-paired.cwl
./tools/LSU-from-tablehits.cwl ./workflows/emg-pipeline-v4-single.cwl
./tools/map_fa_headers.cwl ./workflows/emg-qc-paired.cwl
./tools/mapseq2biom.cwl ./workflows/emg-qc-single.cwl
./tools/mapseq.cwl ./workflows/functional_analysis.cwl
./tools/mask_RNA.cwl ./workflows/orf_prediction.cwl
./tools/megahit.cwl ./workflows/rna-selector.cwl
./tools/metaspades.cwl ./workflows/trim_and_reformat_reads.cwl
The ‘recipes’ used to execute a computational task, e.g. the workflow specification or workflow template.
The execution record of the running a workflow instance, details of every executed process and comprehensive info on execution environment
Changes across the workflow specification, its tool definitions and the software and services it depends on.
{
"@context" : [ "https://w3id.org/bundle/context" ],
"id" : "/",
"manifest" : [ "manifest.json" ],
"createdOn" : "2017-08-24T10:57:46.325Z",
"createdBy" : {
"uri" : "https://view.commonwl.org",
"name" : "Common Workflow Language Viewer"
},
"authoredBy" : [ {
"uri" : "mailto:peter.amstutz@curoverse.com",
"name" : "Peter Amstutz"
}, {
"uri" : "mailto:luka.stojanovic@sbgenomics.com",
"name" : "Luka Stojanovic"
}, {
"uri" : "mailto:crusoe@ucdavis.edu",
"name" : "Michael R. Crusoe"
}, {
"uri" : "mailto:porter@porter.st",
"name" : "Andrey Kartashov"
}, {
"uri" : "mailto:janko.simonovic@sbgenomics.com",
"name" : "Janko Simonovic"
} ],
"retrievedFrom" : "https://github.com/common-workflow-language/workflows/blob/lobstr-v1/workflows/lobSTR/",
"retrievedOn" : "2017-08-24T10:57:46.325Z",
"retrievedBy" : {
"uri" : "https://view.commonwl.org",
"name" : "Common Workflow Language Viewer"
},
"history" : [ "http:/git2prov.org/git2prov?giturl=https:/github.com/common-workflow-language/workflows.git&serialization=PROV-JSON" ],
"aggregates" : [ {
"uri" : "/workflow/tmp_2.fq",
"mediatype" : "application/octet-stream",
"createdOn" : "2017-08-24T10:57:46.923Z",
"authoredBy" : [ {
"uri" : "mailto:peter.amstutz@curoverse.com",
"name" : "Peter Amstutz"
} ],
"retrievedFrom" : "https://raw.githubusercontent.com/common-workflow-language/workflows/lobstr-v1/workflows/lobSTR/tmp_2.fq",
"retrievedBy" : {
"uri" : "https://view.commonwl.org",
"name" : "Common Workflow Language Viewer"
},
"bundledAs" : {
"uri" : "urn:uuid:61579f3e-63e6-49c2-b780-f67b2df461b7",
"folder" : "/workflow/"
}
}, {
"uri" : "/workflow/lobSTR-demo.json",
"mediatype" : "application/json",
"createdOn" : "2017-08-24T10:57:47.216Z",
"authoredBy" : [ {
"uri" : "mailto:peter.amstutz@curoverse.com",
"name" : "Peter Amstutz"
} ],
"retrievedFrom" : "https://raw.githubusercontent.com/common-workflow-language/workflows/lobstr-v1/workflows/lobSTR/lobSTR-demo.json",
"retrievedBy" : {
"uri" : "https://view.commonwl.org",
"name" : "Common Workflow Language Viewer"
},
"bundledAs" : {
"uri" : "urn:uuid:973caa0e-f3bd-45e8-8d29-70123bc8715a",
"folder" : "/workflow/"
}
}, {
"uri" : "/workflow/models/illumina_v3.pcrfree.stuttermodel",
"mediatype" : "application/octet-stream",
"createdOn" : "2017-08-24T10:57:47.239Z",
"authoredBy" : [ {
"uri" : "mailto:peter.amstutz@curoverse.com",
"name" : "Peter Amstutz"
} ],
"retrievedFrom" : "https://raw.githubusercontent.com/common-workflow-language/workflows/lobstr-v1/workflows/lobSTR/models/illumina_v3.pcrfree.stuttermodel",
"retrievedBy" : {
"uri" : "https://view.commonwl.org",
"name" : "Common Workflow Language Viewer"
},
"bundledAs" : {
"uri" : "urn:uuid:62bbcbea-f34f-463f-990d-6148f8ed5e5c",
"folder" : "/workflow/models/"
}
}, {
"uri" : "/workflow/models/illumina_v3.pcrfree.stepmodel",
"mediatype" : "application/octet-stream",
"createdOn" : "2017-08-24T10:57:47.266Z",
"authoredBy" : [ {
"uri" : "mailto:peter.amstutz@curoverse.com",
"name" : "Peter Amstutz"
} ],
"retrievedFrom" : "https://raw.githubusercontent.com/common-workflow-language/workflows/lobstr-v1/workflows/lobSTR/models/illumina_v3.pcrfree.stepmodel",
"retrievedBy" : {
"uri" : "https://view.commonwl.org",
"name" : "Common Workflow Language Viewer"
},
"bundledAs" : {
"uri" : "urn:uuid:03439ae7-cd94-42a3-b5fe-40bfff6882d8",
"folder" : "/workflow/models/"
}
}, {
"uri" : "/workflow/samtools-sort.cwl",
"mediatype" : "text/x-yaml",
"createdOn" : "2017-08-24T10:57:47.269Z",
"authoredBy" : [ {
"uri" : "mailto:luka.stojanovic@sbgenomics.com",
"name" : "Luka Stojanovic"
}, {
"uri" : "mailto:crusoe@ucdavis.edu",
"name" : "Michael R. Crusoe"
}, {
"uri" : "mailto:porter@porter.st",
"name" : "Andrey Kartashov"
}, {
"uri" : "mailto:peter.amstutz@curoverse.com",
"name" : "Peter Amstutz"
} ],
"retrievedFrom" : "https://raw.githubusercontent.com/common-workflow-language/workflows/lobstr-v1/workflows/lobSTR/samtools-sort.cwl",
"retrievedBy" : {
"uri" : "https://view.commonwl.org",
"name" : "Common Workflow Language Viewer"
},
"conformsTo" : "https://w3id.org/cwl/v1.0",
"bundledAs" : {
"uri" : "urn:uuid:2dc07859-efc2-4945-a95f-ba7815b68d07",
"folder" : "/workflow/"
}
}, {
"uri" : "/workflow/lobSTR-workflow.cwl",
"mediatype" : "text/x-yaml",
"createdOn" : "2017-08-24T10:57:47.42Z",
"authoredBy" : [ {
"uri" : "mailto:luka.stojanovic@sbgenomics.com",
"name" : "Luka Stojanovic"
}, {
"uri" : "mailto:crusoe@ucdavis.edu",
"name" : "Michael R. Crusoe"
}, {
"uri" : "mailto:peter.amstutz@curoverse.com",
"name" : "Peter Amstutz"
} ],
"retrievedFrom" : "https://raw.githubusercontent.com/common-workflow-language/workflows/lobstr-v1/workflows/lobSTR/lobSTR-workflow.cwl",
"retrievedBy" : {
"uri" : "https://view.commonwl.org",
"name" : "Common Workflow Language Viewer"
},
"conformsTo" : "https://w3id.org/cwl/v1.0",
"bundledAs" : {
"uri" : "urn:uuid:58bc1895-3460-46d6-91d7-fa1718d09631",
"folder" : "/workflow/"
}
}, {
"uri" : "/workflow/lobSTR-arvados-demo.json",
"mediatype" : "application/json",
"createdOn" : "2017-08-24T10:57:47.453Z",
"authoredBy" : [ {
"uri" : "mailto:peter.amstutz@curoverse.com",
"name" : "Peter Amstutz"
} ],
"retrievedFrom" : "https://raw.githubusercontent.com/common-workflow-language/workflows/lobstr-v1/workflows/lobSTR/lobSTR-arvados-demo.json",
"retrievedBy" : {
"uri" : "https://view.commonwl.org",
"name" : "Common Workflow Language Viewer"
},
"bundledAs" : {
"uri" : "urn:uuid:30c683bc-69fb-4d93-8dad-65b663783af5",
"folder" : "/workflow/"
}
}, {
"uri" : "/workflow/samtools-index.cwl",
"mediatype" : "text/x-yaml",
"createdOn" : "2017-08-24T10:57:47.458Z",
"authoredBy" : [ {
"uri" : "mailto:luka.stojanovic@sbgenomics.com",
"name" : "Luka Stojanovic"
}, {
"uri" : "mailto:crusoe@ucdavis.edu",
"name" : "Michael R. Crusoe"
}, {
"uri" : "mailto:porter@porter.st",
"name" : "Andrey Kartashov"
}, {
"uri" : "mailto:peter.amstutz@curoverse.com",
"name" : "Peter Amstutz"
} ],
"retrievedFrom" : "https://raw.githubusercontent.com/common-workflow-language/workflows/lobstr-v1/workflows/lobSTR/samtools-index.cwl",
"retrievedBy" : {
"uri" : "https://view.commonwl.org",
"name" : "Common Workflow Language Viewer"
},
"conformsTo" : "https://w3id.org/cwl/v1.0",
"bundledAs" : {
"uri" : "urn:uuid:8235d3f8-6927-4f73-b160-8521838a1cbb",
"folder" : "/workflow/"
}
}, {
"uri" : "/workflow/lobSTR-tool.cwl",
"mediatype" : "text/x-yaml",
"createdOn" : "2017-08-24T10:57:47.476Z",
"authoredBy" : [ {
"uri" : "mailto:luka.stojanovic@sbgenomics.com",
"name" : "Luka Stojanovic"
}, {
"uri" : "mailto:crusoe@ucdavis.edu",
"name" : "Michael R. Crusoe"
}, {
"uri" : "mailto:peter.amstutz@curoverse.com",
"name" : "Peter Amstutz"
} ],
"retrievedFrom" : "https://raw.githubusercontent.com/common-workflow-language/workflows/lobstr-v1/workflows/lobSTR/lobSTR-tool.cwl",
"retrievedBy" : {
"uri" : "https://view.commonwl.org",
"name" : "Common Workflow Language Viewer"
},
"conformsTo" : "https://w3id.org/cwl/v1.0",
"bundledAs" : {
"uri" : "urn:uuid:7fa6fbe4-1fc5-4cb5-9c1a-56b96c5f7aaf",
"folder" : "/workflow/"
}
}, {
"uri" : "/workflow/allelotype.cwl",
"mediatype" : "text/x-yaml",
"createdOn" : "2017-08-24T10:57:47.537Z",
"authoredBy" : [ {
"uri" : "mailto:luka.stojanovic@sbgenomics.com",
"name" : "Luka Stojanovic"
}, {
"uri" : "mailto:janko.simonovic@sbgenomics.com",
"name" : "Janko Simonovic"
}, {
"uri" : "mailto:crusoe@ucdavis.edu",
"name" : "Michael R. Crusoe"
}, {
"uri" : "mailto:peter.amstutz@curoverse.com",
"name" : "Peter Amstutz"
} ],
"retrievedFrom" : "https://raw.githubusercontent.com/common-workflow-language/workflows/lobstr-v1/workflows/lobSTR/allelotype.cwl",
"retrievedBy" : {
"uri" : "https://view.commonwl.org",
"name" : "Common Workflow Language Viewer"
},
"conformsTo" : "https://w3id.org/cwl/v1.0",
"bundledAs" : {
"uri" : "urn:uuid:3706bd2f-e53f-431d-b32a-deb661d9b292",
"folder" : "/workflow/"
}
}, {
"uri" : "/workflow/README",
"mediatype" : "application/octet-stream",
"createdOn" : "2017-08-24T10:57:47.555Z",
"authoredBy" : [ {
"uri" : "mailto:crusoe@ucdavis.edu",
"name" : "Michael R. Crusoe"
}, {
"uri" : "mailto:peter.amstutz@curoverse.com",
"name" : "Peter Amstutz"
} ],
"retrievedFrom" : "https://raw.githubusercontent.com/common-workflow-language/workflows/lobstr-v1/workflows/lobSTR/README",
"retrievedBy" : {
"uri" : "https://view.commonwl.org",
"name" : "Common Workflow Language Viewer"
},
"bundledAs" : {
"uri" : "urn:uuid:ed54c4d6-c585-4dc9-b7bc-0cf299e20b91",
"folder" : "/workflow/"
}
}, {
"uri" : "/workflow/tmp_1.fq",
"mediatype" : "application/octet-stream",
"createdOn" : "2017-08-24T10:57:47.738Z",
"authoredBy" : [ {
"uri" : "mailto:peter.amstutz@curoverse.com",
"name" : "Peter Amstutz"
} ],
"retrievedFrom" : "https://raw.githubusercontent.com/common-workflow-language/workflows/lobstr-v1/workflows/lobSTR/tmp_1.fq",
"retrievedBy" : {
"uri" : "https://view.commonwl.org",
"name" : "Common Workflow Language Viewer"
},
"bundledAs" : {
"uri" : "urn:uuid:5d431f81-ad0b-4acf-903a-9d5aa03b04df",
"folder" : "/workflow/"
}
}, {
"uri" : "/visualisation.png",
"mediatype" : "image/png",
"createdOn" : "2017-08-24T10:57:47.801Z",
"retrievedFrom" : "https://view.commonwl.org/graph/png/github.com/common-workflow-language/workflows/blob/lobstr-v1/workflows/lobSTR/lobSTR-workflow.cwl",
"bundledAs" : {
"uri" : "urn:uuid:ff9ace37-e76c-49f8-8d36-60f11ff6d257",
"folder" : "/"
}
}, {
"uri" : "/visualisation.svg",
"mediatype" : "image/svg+xml",
"createdOn" : "2017-08-24T10:57:47.821Z",
"retrievedFrom" : "https://view.commonwl.org/graph/svg/github.com/common-workflow-language/workflows/blob/lobstr-v1/workflows/lobSTR/lobSTR-workflow.cwl",
"bundledAs" : {
"uri" : "urn:uuid:a6cfb437-8818-4ab2-9081-efc74c5109e8",
"folder" : "/"
}
} ],
"annotations" : [ {
"uri" : "urn:uuid:9f602fff-b280-41c5-9590-ab95a49c85ad",
"about" : "/",
"content" : "annotations/merged.cwl"
}, {
"uri" : "urn:uuid:0ce4b727-ff61-4534-9afb-e3d676d2782d",
"about" : "/",
"content" : "annotations/workflow.ttl"
} ]
}
#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: Workflow
label: "Hello World"
doc: "Outputs a message using echo"
inputs: []
outputs:
response:
outputSource: step0/response
type: File
steps:
step0:
run:
class: CommandLineTool
inputs:
message:
type: string
doc: "The message to print"
default: "Hello World"
inputBinding:
position: 1
baseCommand: echo
stdout: response.txt
outputs:
response:
type: stdout
in: []
out: [response]
https://w3id.org/cwl/view/{scheme}/{commit}/{path}{?part=fragment}
Any git permalinks are resolved using https://view.commonwl.org/git which - if it knows about that particular git commit - will content-negotiate to provide various representations.
Anyone can mint these permalinks for .cwl files for a given commit, in any public or private git repository, given no uncommitted files or git submodules.
Farah Zaib Khan et al,
CWLProv – Interoperable retrospective provenance capture and its challenges,
BOSC 2018
Provenance using PROV-Model
expanded with wfprov and wfdesc
Workflow specifications
./revsort-run-1/bagit.txt
./revsort-run-1/bag-info.txt
./revsort-run-1/manifest-sha1.txt
./revsort-run-1/tagmanifest-sha1.txt
./revsort-run-1/data
./revsort-run-1/data/97/97fe1b50b4582cebc7d853796ebd62e3e163aa3f
./revsort-run-1/data/32/327fc7aedf4f6b69a42a7c8b808dc5a7aff61376
./revsort-run-1/data/b9/b9214658cc453331b62c2282b772a5c063dbd284
./revsort-run-1/workflow
./revsort-run-1/workflow/packed.cwl
./revsort-run-1/workflow/primary-output.json
./revsort-run-1/workflow/primary-job.json
./revsort-run-1/snapshot
./revsort-run-1/snapshot/revtool.cwl
./revsort-run-1/snapshot/revsort.cwl
./revsort-run-1/snapshot/sorttool.cwl
./revsort-run-1/metadata/logs/engine.ac9c1653-4291-47bc-86f8-6dedcff13519.txt
./revsort-run-1/metadata/manifest.jsonld
./revsort-run-1/metadata/provenance/primary.cwlprov.provn
./revsort-run-1/metadata/provenance/primary.cwlprov.json
./revsort-run-1/metadata/provenance/primary.cwlprov.ttl
./revsort-run-1/metadata/provenance/primary.cwlprov.jsonld
./revsort-run-1/metadata/provenance/primary.cwlprov.xml
./revsort-run-1/metadata/provenance/primary.cwlprov.nt
Completeness
Workflow values
Rerunnable
Reusable
Unstructured log
Relations and identifiers
Structured execution log
Profile
Level 0
Level 1
"Transport
level"
document
prefix wfprov <http://purl.org/wf4ever/wfprov#>
prefix prov <http://www.w3.org/ns/prov#>
prefix wfdesc <http://purl.org/wf4ever/wfdesc#>
prefix wf <https://w3id.org/cwl/view/git/933bf2a1a1cce32d88f88f136275535da9df0954/workflows/hello/hello.cwl#>
prefix input <app://579c1b74-b328-4da6-80a8-a2ffef2ac9b5/workflow/input.json#>
prefix run <urn:uuid:>
prefix engine <urn:uuid:>
prefix data <urn:hash:sha256:>
default <app://579c1b74-b328-4da6-80a8-a2ffef2ac9b5/>
// Level 1 provenance of workflow run
activity(run:2e1287e0-6dfb-11e7-8acf-0242ac110002, , , [prov:type='wfprov:WorkflowRun', prov:label="Run of workflow/packed.cwl#main"])
wasStartedBy(run:2e1287e0-6dfb-11e7-8acf-0242ac110002, -, -, -, 2017-10-27T14:24:00+01:00)
// The engine is the SoftwareAgent that is executing our Workflow plan
wasAssociatedWith(run:2e1287e0-6dfb-11e7-8acf-0242ac110002, engine:b2210211-8acb-4d58-bd28-2a36b18d3b4f, wf:main)
agent(engine:b2210211-8acb-4d58-bd28-2a36b18d3b4f, prov:type='prov:SoftwareAgent', prov:type='wfprov:WorkflowEngine', prov:label="cwltool v1.2.5")
// prov has no term to relate sub-plans - we'll use wfdesc:hasSubProcess
entity(wf:main,[prov:type='wfdesc:Workflow', prov:type='prov:Plan', wfdesc:hasSubProcess='wf:main/step1', wfdesc:hasSubProcess='wf:main/step2'])
alternateOf(wf:main, workflow/packed.cwl)
entity(wf:main/step1,[prov:type='wfdesc:Process', prov:type='prov:Plan'])
entity(wf:main/step2,[prov:type='wfdesc:Process', prov:type='prov:Plan'])
// First the workflow uses some data; here with a urn:sha:sha256 identifier
used(run:2e1287e0-6dfb-11e7-8acf-0242ac110002, data:5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03, 2017-10-27T14:29:00+01:00, [prov:role='wf:main/input1']))
entity(data:5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03, [prov:type='wfprov:Artifact'])
// which we have stored a copy of within the research object
specializationOf(data/58/5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03, data:5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03)
// Then there was another activity - wfprov:ProcessRun indicating a command line tool
activity(run:4305467e-6dfb-11e7-885d-0242ac110002, -, -, [prov:type='wfprov:ProcessRun', prov:label="Run of workflow/packed.cwl#main/step1"])
// started by the mother activity
wasStartedBy(run:4305467e-6dfb-11e7-885d-0242ac110002, -, -, run:2e1287e0-6dfb-11e7-8acf-0242ac110002, 2017-10-27T15:00:00+01:00)
// same engine using step1 as plan. In a distributed scenario there might be a different engine
wasAssociatedWith(run:4305467e-6dfb-11e7-885d-0242ac110002, engine:b2210211-8acb-4d58-bd28-2a36b18d3b4f, wf:main/step1)
// This activity also use the same data, but in a different role (e.g. input parameter)
used(run:4305467e-6dfb-11e7-885d-0242ac110002, data:5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03, 2017-10-27T14:00:00+01:00, [prov:role='wf:main/step1/in1'])
// And we generate some new data
wasGeneratedBy(data:00688350913f2f292943a274b57019d58889eda272370af261c84e78e204743c, run:4305467e-6dfb-11e7-885d-0242ac110002, 2017-10-27T16:00:00+01:00, [prov:role='wf:main/step1/out1']))
entity(data:00688350913f2f292943a274b57019d58889eda272370af261c84e78e204743c, [prov:type='wfprov:Artifact'])
// again stored in the RO
specializationOf(data/00/00688350913f2f292943a274b57019d58889eda272370af261c84e78e204743c, data:00688350913f2f292943a274b57019d58889eda272370af261c84e78e204743c)
// step1 finished
wasEndedBy(run:4305467e-6dfb-11e7-885d-0242ac110002, -, -, run:2e1287e0-6dfb-11e7-8acf-0242ac110002, 2017-10-27T15:30:00+01:00)
// the master workflow then "generate" that same value, but now at a different time and role (the resultA master workflow output)
wasGeneratedBy(data:00688350913f2f292943a274b57019d58889eda272370af261c84e78e204743c, run:2e1287e0-6dfb-11e7-8acf-0242ac110002, 2017-10-27T15:00:00+01:00, [prov:role='wf:main/resultA'])
// next step activity
activity(run:c42dc36e-6dfd-11e7-bc24-0242ac110002, -, - [prov:type='wfprov:ProcessRun', prov:label="Run of workflow/packed.cwl#main/step2"])
wasStartedBy(run:c42dc36e-6dfd-11e7-bc24-0242ac110002, -, -, run:2e1287e0-6dfb-11e7-8acf-0242ac110002, 2017-10-27T16:00:00+01:00)
// associated with step2
wasAssociatedWith(run:c42dc36e-6dfd-11e7-bc24-0242ac110002, engine:b2210211-8acb-4d58-bd28-2a36b18d3b4f, wf:main/step2)
// Uses two data artifacts; one which came from previous step, other as workflow input
used(run:4305467e-6dfb-11e7-885d-0242ac110002, data:5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03, 2017-10-27T15:00:00+01:00, [prov:role='wf:main/step2/valueA'])
used(run:4305467e-6dfb-11e7-885d-0242ac110002, data:00688350913f2f292943a274b57019d58889eda272370af261c84e78e204743c, 2017-10-27T15:00:00+01:00, [prov:role='wf:main/step2/valueB'])
// and generate two new data artifacts
wasGeneratedBy(data:952f537d1f3116db56703787ace248fe00ae46fa77ea3803aa3d8dc01d221a9d, run:c42dc36e-6dfd-11e7-bc24-0242ac110002, 2017-10-27T16:34:20+01:00, [prov:role='wf:main/step2/out1'])))
entity(data:952f537d1f3116db56703787ace248fe00ae46fa77ea3803aa3d8dc01d221a9d, [prov:type='wfprov:Artifact'])
specializationOf(data/95/2f537d1f3116db56703787ace248fe00ae46fa77ea3803aa3d8dc01d221a9d, data:952f537d1f3116db56703787ace248fe00ae46fa77ea3803aa3d8dc01d221a9d)
wasGeneratedBy(data:3deb00bd0decd1f21d015a178c4f23a5eb537588c08eeee9d55059ec29637be0, run:c42dc36e-6dfd-11e7-bc24-0242ac110002, 2017-10-27T16:34:20+01:00, [prov:role='wf:main/step2/out2'])))
entity(data:3deb00bd0decd1f21d015a178c4f23a5eb537588c08eeee9d55059ec29637be0, [prov:type='wfprov:Artifact'])
specializationOf(data/3d/eb00bd0decd1f21d015a178c4f23a5eb537588c08eeee9d55059ec29637be0, data:3deb00bd0decd1f21d015a178c4f23a5eb537588c08eeee9d55059ec29637be0)
// step2 ends
wasEndedBy(run:c42dc36e-6dfd-11e7-bc24-0242ac110002, -, -, run:2e1287e0-6dfb-11e7-8acf-0242ac110002, 2017-10-27T16:30:00+01:00)
// only step output out1 captured by mother workflow, sent to resultB workflow output
wasGeneratedBy(data:952f537d1f3116db56703787ace248fe00ae46fa77ea3803aa3d8dc01d221a9d, run:2e1287e0-6dfb-11e7-8acf-0242ac110002, 2017-10-27T15:00:00+01:00, [prov:role='wf:main/resultB'])
// mother workflow ends
wasEndedBy(run:2e1287e0-6dfb-11e7-8acf-0242ac110002, -, -, run:2e1287e0-6dfb-11e7-8acf-0242ac110002, 2017-10-27T16:34:40+01:00)
endDocument
<prov:wasGeneratedBy>
<prov:entity prov:ref="ex:ent1"/>
<prov:activity prov:ref="ex:act1"/>
<prov:time>2017-10-26T21:32:52Z</prov:time>
<ex:port>p1</ex:port>
</prov:wasGeneratedBy>
wasGeneratedBy(ent1, act1,
2017-10-26T21:32:52Z, ex:port="p1")
:ent1
a prov:Entity;
prov:wasGeneratedBy :act1;
prov:generatedAtTime "2017-10-26T21:32:52Z"^^xsd:dateTime ;
ex:port "p1" .
"wasGeneratedBy": {
"ex:gen1": {
"prov:entity": "ent1",
"prov:activity": "act1",
"prov:time": "2017-10-26T21:32:52Z",
"ex:port": "p1"
},
},
{ "@context": { .. },
"@id": "ent1",
"@type": "prov:Entity",
"ex:port": "p1",
"prov:generatedAtTime": "2017-10-26T21:32:52Z",
"prov:wasGeneratedBy": {
"@id": "act1",
"@type": "prov:Activity"
}
}
PROV-N
PROV-XML
PROV-JSON
PROV-O Turtle
PROV-O JSON-LD
prefix id <urn:uuid:>
prefix provenance <arcp://uuid,73eab018-7b36-4f84-a845-aca8073bd46c/metadata/provenance/>
agent(id:a606d227-bf10-4479-8d11-823bb932bbac,
[prov:type='wfprov:WorkflowEngine', prov:type='prov:SoftwareAgent',
prov:label="cwltool 1.0.20180817162414"])
activity(id:73eab018-7b36-4f84-a845-aca8073bd46c, 2018-08-21T15:20:35.059920, -,
[prov:type='wfprov:WorkflowRun', prov:label="Run of workflow/packed.cwl#main"])
wasStartedBy(id:73eab018-7b36-4f84-a845-aca8073bd46c, -, id:a606d227-bf10-4479-8d11-823bb932bbac, 2018-08-21T15:20:35.060038)
activity(id:e79fc8dc-6e40-4236-b22c-41fee22947a9, -, -,
[prov:type='wfprov:ProcessRun', prov:label="Run of workflow/packed.cwl#main/compile"])
wasStartedBy(id:e79fc8dc-6e40-4236-b22c-41fee22947a9, -, id:73eab018-7b36-4f84-a845-aca8073bd46c, 2018-08-21T15:20:35.163189)
activity(id:e79fc8dc-6e40-4236-b22c-41fee22947a9, -, -,
[prov:has_provenance='provenance:workflow_compile.e79fc8dc-6e40-4236-b22c-41fee22947a9.cwlprov.provn',
prov:has_provenance='provenance:workflow_compile.e79fc8dc-6e40-4236-b22c-41fee22947a9.cwlprov.ttl'
])
{
"about": "urn:uuid:e79fc8dc-6e40-4236-b22c-41fee22947a9",
"content": [
"provenance/workflow_20compile.e79fc8dc-6e40-4236-b22c-41fee22947a9.cwlprov.provn",
"provenance/workflow_20compile.e79fc8dc-6e40-4236-b22c-41fee22947a9.cwlprov.ttl",
],
"oa:motivatedBy": {
"@id": "http://www.w3.org/ns/prov#has_provenance"
}
}
metadata/provenance/primary.cwlprov.provn
metadata/manifest.json
activity(id:e79fc8dc-6e40-4236-b22c-41fee22947a9, -, -,
[prov:type='wfprov:ProcessRun', prov:label="Run of workflow/packed.cwl#main/compile"])
wasStartedBy(id:e79fc8dc-6e40-4236-b22c-41fee22947a9, -, id:73eab018-7b36-4f84-a845-aca8073bd46c, 2018-08-21T15:20:35.163189)
activity(id:e79fc8dc-6e40-4236-b22c-41fee22947a9, -, -,
[prov:has_provenance='provenance:workflow_compile.e79fc8dc-6e40-4236-b22c-41fee22947a9.cwlprov.provn',
prov:has_provenance='provenance:workflow_compile.e79fc8dc-6e40-4236-b22c-41fee22947a9.cwlprov.ttl'
])
prefix id <urn:uuid:>
agent(id:a606d227-bf10-4479-8d11-823bb932bbac,
[prov:type='wfprov:WorkflowEngine', prov:type='prov:SoftwareAgent',
prov:label="cwltool 1.0.20180817162414"])
activity(id:e79fc8dc-6e40-4236-b22c-41fee22947a9, 2018-08-21T15:20:35.089187, -,
[prov:type='wfprov:WorkflowRun', prov:label="Run of workflow/packed.cwl#compile.cwl"])
wasStartedBy(id:e79fc8dc-6e40-4236-b22c-41fee22947a9, -, id:a606d227-bf10-4479-8d11-823bb932bbac, 2018-08-21T15:20:35.089303)
activity(id:9b1a2b69-3403-4063-9ff0-e7ac2df32036, -, -,
[prov:type='wfprov:ProcessRun', prov:label="Run of workflow/packed.cwl#compile.cwl/step1"])
wasStartedBy(id:9b1a2b69-3403-4063-9ff0-e7ac2df32036, -, id:e79fc8dc-6e40-4236-b22c-41fee22947a9, 2018-08-21T15:20:35.163189)
activity(id:9b1a2b69-3403-4063-9ff0-e7ac2df32036, -, -,
[prov:type='wfprov:ProcessRun', prov:label="Run of workflow/packed.cwl#compile.cwl/step2"])
wasStartedBy(id:9b1a2b69-3403-4063-9ff0-e7ac2df32036, -, id:e79fc8dc-6e40-4236-b22c-41fee22947a9, 2018-08-21T15:20:35.163189)
metadata/provenance/
primary.cwlprov.provn
metadata/provenance/
workflow_compile.e79fc8dc-6e40-4236-b22c-41fee22947a9.cwlprov.provn
Level 1
Level 2
Output 1B file is also Input 2C and Input 3D downstream
Simple filenames -> duplications
./data/step1/outputB.txt
./data/step2/inputC.txt
./data/step3/inputD.txt
Content-adressable
SHA-256 hash of bytes as filename:
./data/51/51fb8af0c4ae0422fbe88340d91880ecb9d7537cf57339c1cf1256b7ca58f32d
RFC6920 URI as global identifier:
nih:sha-256;51fb8af0c4ae0422fbe88340d91880ecb9d7537cf57339c1cf1256b7ca58f32d
{
"@context": [
{
"@base": "arcp://uuid,1f767ad4-ac52-4623-b5bc-dd9faf2b869f/metadata/"
},
"https://w3id.org/bundle/context"
],
"id": "/",
"conformsTo": "https://w3id.org/cwl/prov/0.6.0",
"manifest": "manifest.json",
"createdOn": "2018-10-25T15:46:43.191346",
"createdBy": {
"uri": "urn:uuid:ac9c1653-4291-47bc-86f8-6dedcff13519",
"name": "cwltool 1.0.20181012180214"
},
"authoredBy": {
"orcid": "https://orcid.org/0000-0001-9842-9718",
"name": "Stian Soiland-Reyes"
},
"aggregates": [
{
"uri": "urn:hash::sha1:327fc7aedf4f6b69a42a7c8b808dc5a7aff61376",
"bundledAs": {
"uri": "arcp://uuid,1f767ad4-ac52-4623-b5bc-dd9faf2b869f/data/32/327fc7aedf4f6b69a42a7c8b808dc5a7aff61376",
"folder": "/data/32/",
"filename": "327fc7aedf4f6b69a42a7c8b808dc5a7aff61376"
}
},
{
"uri": "urn:hash::sha1:97fe1b50b4582cebc7d853796ebd62e3e163aa3f",
"bundledAs": {
"uri": "arcp://uuid,1f767ad4-ac52-4623-b5bc-dd9faf2b869f/data/97/97fe1b50b4582cebc7d853796ebd62e3e163aa3f",
"folder": "/data/97/",
"filename": "97fe1b50b4582cebc7d853796ebd62e3e163aa3f"
}
},
{
"uri": "urn:hash::sha1:b9214658cc453331b62c2282b772a5c063dbd284",
"bundledAs": {
"uri": "arcp://uuid,1f767ad4-ac52-4623-b5bc-dd9faf2b869f/data/b9/b9214658cc453331b62c2282b772a5c063dbd284",
"folder": "/data/b9/",
"filename": "b9214658cc453331b62c2282b772a5c063dbd284"
}
},
{
"uri": "provenance/primary.cwlprov.xml",
"mediatype": "application/xml",
"conformsTo": [
"http://www.w3.org/TR/2013/NOTE-prov-xml-20130430/",
"https://w3id.org/cwl/prov/0.6.0"
],
"createdOn": "2018-10-25T15:46:43.191481",
"createdBy": {
"uri": "urn:uuid:ac9c1653-4291-47bc-86f8-6dedcff13519",
"name": "cwltool 1.0.20181012180214"
}
},
{
"uri": "../snapshot/revtool.cwl",
"mediatype": "text/x+yaml; charset=\"UTF-8\"",
"conformsTo": "https://w3id.org/cwl/",
"createdOn": "2018-06-05T15:19:48.781496"
},
{
"uri": "../snapshot/empty.ttl",
"mediatype": "text/turtle; charset=\"UTF-8\"",
"createdOn": "2018-04-04T13:29:55.717707"
},
{
"uri": "provenance/primary.cwlprov.json",
"mediatype": "application/json",
"conformsTo": [
"http://www.w3.org/Submission/2013/SUBM-prov-json-20130424/",
"https://w3id.org/cwl/prov/0.6.0"
],
"createdOn": "2018-10-25T15:46:43.191686",
"createdBy": {
"uri": "urn:uuid:ac9c1653-4291-47bc-86f8-6dedcff13519",
"name": "cwltool 1.0.20181012180214"
}
},
{
"uri": "../workflow/packed.cwl",
"mediatype": "text/x+yaml; charset=\"UTF-8\"",
"conformsTo": "https://w3id.org/cwl/",
"createdOn": "2018-10-25T15:46:43.191761",
"createdBy": {
"uri": "urn:uuid:ac9c1653-4291-47bc-86f8-6dedcff13519",
"name": "cwltool 1.0.20181012180214"
}
},
{
"uri": "provenance/primary.cwlprov.provn",
"mediatype": "text/provenance-notation; charset=\"UTF-8\"",
"conformsTo": [
"http://www.w3.org/TR/2013/REC-prov-n-20130430/",
"https://w3id.org/cwl/prov/0.6.0"
],
"createdOn": "2018-10-25T15:46:43.191825",
"createdBy": {
"uri": "urn:uuid:ac9c1653-4291-47bc-86f8-6dedcff13519",
"name": "cwltool 1.0.20181012180214"
}
},
{
"uri": "../snapshot/revsort.cwl",
"mediatype": "text/x+yaml; charset=\"UTF-8\"",
"conformsTo": "https://w3id.org/cwl/",
"createdOn": "2018-10-25T15:40:08.769943"
},
{
"uri": "../workflow/primary-output.json",
"mediatype": "application/json",
"createdOn": "2018-10-25T15:46:43.191944",
"createdBy": {
"uri": "urn:uuid:ac9c1653-4291-47bc-86f8-6dedcff13519",
"name": "cwltool 1.0.20181012180214"
}
},
{
"uri": "provenance/primary.cwlprov.ttl",
"mediatype": "text/turtle; charset=\"UTF-8\"",
"conformsTo": [
"http://www.w3.org/TR/2013/REC-prov-o-20130430/",
"https://w3id.org/cwl/prov/0.6.0"
],
"createdOn": "2018-10-25T15:46:43.192006",
"createdBy": {
"uri": "urn:uuid:ac9c1653-4291-47bc-86f8-6dedcff13519",
"name": "cwltool 1.0.20181012180214"
}
},
{
"uri": "provenance/primary.cwlprov.jsonld",
"mediatype": "application/ld+json",
"conformsTo": [
"http://www.w3.org/TR/2013/REC-prov-o-20130430/",
"https://w3id.org/cwl/prov/0.6.0"
],
"createdOn": "2018-10-25T15:46:43.192069",
"createdBy": {
"uri": "urn:uuid:ac9c1653-4291-47bc-86f8-6dedcff13519",
"name": "cwltool 1.0.20181012180214"
}
},
{
"uri": "../snapshot/sorttool.cwl",
"mediatype": "text/x+yaml; charset=\"UTF-8\"",
"conformsTo": "https://w3id.org/cwl/",
"createdOn": "2018-06-05T15:19:48.785496"
},
{
"uri": "provenance/primary.cwlprov.nt",
"mediatype": "application/n-triples",
"conformsTo": [
"http://www.w3.org/TR/2013/REC-prov-o-20130430/",
"https://w3id.org/cwl/prov/0.6.0"
],
"createdOn": "2018-10-25T15:46:43.192188",
"createdBy": {
"uri": "urn:uuid:ac9c1653-4291-47bc-86f8-6dedcff13519",
"name": "cwltool 1.0.20181012180214"
}
},
{
"uri": "logs/engine.ac9c1653-4291-47bc-86f8-6dedcff13519.txt",
"mediatype": "text/plain; charset=\"UTF-8\"",
"createdOn": "2018-10-25T15:46:43.192249",
"createdBy": {
"uri": "urn:uuid:ac9c1653-4291-47bc-86f8-6dedcff13519",
"name": "cwltool 1.0.20181012180214"
}
},
{
"uri": "../workflow/primary-job.json",
"mediatype": "application/json",
"createdOn": "2018-10-25T15:46:43.192312",
"createdBy": {
"uri": "urn:uuid:ac9c1653-4291-47bc-86f8-6dedcff13519",
"name": "cwltool 1.0.20181012180214"
}
},
{
"createdOn": "2018-10-25T15:46:35.303633",
"createdBy": {
"uri": "urn:uuid:ac9c1653-4291-47bc-86f8-6dedcff13519",
"name": "cwltool 1.0.20181012180214"
},
"uri": "urn:uuid:ed8d007b-a1f3-4bfe-b390-08df074d712d"
},
{
"createdOn": "2018-10-25T15:46:37.067848",
"createdBy": {
"uri": "urn:uuid:ac9c1653-4291-47bc-86f8-6dedcff13519",
"name": "cwltool 1.0.20181012180214"
},
"uri": "urn:uuid:4ab5a3fe-e481-4f7f-98c4-af8e5dfccb93"
}
],
"annotations": [
{
"uri": "urn:uuid:42a4ade6-245b-4746-acf9-e9910780a449",
"about": "urn:uuid:1f767ad4-ac52-4623-b5bc-dd9faf2b869f",
"content": "/",
"oa:motivatedBy": {
"@id": "oa:describing"
}
},
{
"uri": "urn:uuid:9ed6545d-dfb2-4cab-b4af-102735a2661e",
"about": "urn:uuid:1f767ad4-ac52-4623-b5bc-dd9faf2b869f",
"content": [
"provenance/primary.cwlprov.xml",
"provenance/primary.cwlprov.json",
"provenance/primary.cwlprov.provn",
"provenance/primary.cwlprov.ttl",
"provenance/primary.cwlprov.jsonld",
"provenance/primary.cwlprov.nt"
],
"oa:motivatedBy": {
"@id": "http://www.w3.org/ns/prov#has_provenance"
}
},
{
"uri": "urn:uuid:1c23181c-905c-49aa-a5e3-7194f9a43c29",
"about": "../workflow/packed.cwl",
"oa:motivatedBy": {
"@id": "oa:highlighting"
}
},
{
"uri": "urn:uuid:4f4132a7-c27d-47d5-a96f-3ad6ca741fe8",
"about": "urn:uuid:1f767ad4-ac52-4623-b5bc-dd9faf2b869f",
"content": [
"../workflow/packed.cwl",
"../workflow/primary-job.json"
],
"oa:motivatedBy": {
"@id": "oa:linking"
}
},
{
"uri": "urn:uuid:6cf23e43-cbd1-4b77-af65-d9a32dc913dc",
"about": "urn:uuid:ac9c1653-4291-47bc-86f8-6dedcff13519",
"content": [
"metadata/logs/engine.ac9c1653-4291-47bc-86f8-6dedcff13519.txt"
],
"oa:motivatedBy": {
"@id": "https://w3id.org/cwl/prov#log"
}
}
]
}
metadata/manifest.json
$ cwlprov --help
usage: cwlprov [-h] [--version] [--directory DIRECTORY] [--relative]
[--absolute] [--output OUTPUT] [--verbose] [--quiet] [--hints]
[--no-hints]
{validate,info,who,prov,inputs,outputs,run,runs,rerun,derived,runtimes}
...
cwlprov explores Research Objects containing provenance of Common Workflow
Language executions. <https://w3id.org/cwl/prov/>
commands:
{validate,info,who,prov,inputs,outputs,run,runs,rerun,derived,runtimes}
validate Validate the CWLProv Research Object
info show research object Metadata
who show Who ran the workflow
prov export workflow execution Provenance in PROV format
inputs list workflow/step Input files/values
outputs list workflow/step Output files/values
run show workflow Execution log
runs List all workflow executions in RO
rerun Rerun a workflow or step
derived list what was Derived from a data item, based on
activity usage/generation
runtimes calculate average step execution Runtimes
stain@biggie:~/src/cwlprov-py/test/revsort-cwlprov-0.4.0$ cwlprov --verbose validate
INFO:cwlprov.tool:Detected /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/bagit.txt
INFO:cwlprov.tool:Opening BagIt /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/data/32/327fc7aedf4f6b69a42a7c8b808dc5a7aff61376
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/data/97/97fe1b50b4582cebc7d853796ebd62e3e163aa3f
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/data/b9/b9214658cc453331b62c2282b772a5c063dbd284
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/workflow/packed.cwl
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/workflow/primary-job.json
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/metadata/provenance/primary.cwlprov.xml
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/metadata/provenance/primary.cwlprov.provn
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/metadata/provenance/primary.cwlprov.json
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/metadata/provenance/primary.cwlprov.ttl
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/metadata/provenance/primary.cwlprov.nt
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/metadata/provenance/primary.cwlprov.jsonld
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/snapshot/revsort.cwl
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/snapshot/hello.txt
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/snapshot/revtool.cwl
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/snapshot/empty.ttl
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/snapshot/sorttool.cwl
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/metadata/manifest.json
INFO:bagit:Verifying checksum for file /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/bag-info.txt
INFO:cwlprov.ro:Parsing RO manifest /home/stain/src/cwlprov-py/test/revsort-cwlprov-0.4.0/metadata/manifest.json
Valid CWLProv RO: .
Validating
(venv3) stain@biggie:~/src/cwlprov-py/test/nested-cwlprov-0.3.0$ cwlprov run
2018-08-08 22:44:06.573330 Flow 39408a40-c1c8-4852-9747-87249425be1e [ Run of workflow/packed.cwl#main
2018-08-08 22:44:06.691722 Step 4f082fb6-3e4d-4a21-82e3-c685ce3deb58 Run of workflow/packed.cwl#main/create-tar (0:00:00.010133)
2018-08-08 22:44:06.702976 Step 0cceeaf6-4109-4f08-940b-f06ac959944a * Run of workflow/packed.cwl#main/compile (unknown duration)
2018-08-08 22:44:12.680097 Flow 39408a40-c1c8-4852-9747-87249425be1e ] Run of workflow/packed.cwl#main (0:00:06.106767)
Legend:
[ Workflow start
* Nested provenance, use UUID to explore: cwlprov run 0cceeaf6-4109-4f08-940b-f06ac959944a
] Workflow end
(venv3) stain@biggie:~/src/cwlprov-py/test/nested-cwlprov-0.3.0$ cwlprov run 0cceeaf6-4109-4f08-940b-f06ac959944a
2018-08-08 22:44:06.607210 Flow 0cceeaf6-4109-4f08-940b-f06ac959944a [ Run of workflow/packed.cwl#main
2018-08-08 22:44:06.707070 Step 83752ab4-8227-4d4a-8baa-78376df34aed Run of workflow/packed.cwl#main/untar (0:00:00.008149)
2018-08-08 22:44:06.718554 Step f56d8478-a190-4251-84d9-7f69fe0f6f8b Run of workflow/packed.cwl#main/argument (0:00:00.532052)
2018-08-08 22:44:07.251588 Flow 0cceeaf6-4109-4f08-940b-f06ac959944a ] Run of workflow/packed.cwl#main (0:00:00.644378)
Legend:
[ Workflow start
] Workflow end
stain@biggie:~/src/cwlprov-py/test/nested-cwlprov-0.3.0$ cwlprov outputs 4f082fb6-3e4d-4a21-82e3-c685ce3deb58 --format=files
Output tar:
data/c0/c0fd5812fe6d8d91fef7f4f1ba3a462500fce0c5
stain@biggie:~/src/cwlprov-py/test/nested-cwlprov-0.3.0$ tar tfv `cwlprov -q outputs 4f082fb6-3e4d-4a21-82e3-c685ce3deb58 --format=files`
-rw-r--r-- stain/stain 115 2018-08-08 23:44 Hello.java
Inspecting step runs
TODO: Best practice for publishing CWLProv ROs
>1 GB -> "Big Data"
stain@biggie:~/dropbox/work/cwlprov/newest-swift$ du -hs *
13G alignment_0.6.0_linux
2.9G rnaseqwf_0.5.0_mac
80M somaticwf_0.5.0_ma
<?xml version="1.0" encoding="UTF-8"?>
<ur:UsageRecord xmlns="http://schema.ogf.org/urf/2013/04/urf"
xmlns:ur="http://schema.ogf.org/urf/2013/04/urf" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://schema.ogf.org/urf/2013/04/urf">
<ur:RecordIdentityBlock>
<ur:RecordId>urn:uuid:4350d583-61a5-45e8-a229-957aa81e8014</ur:RecordId>
<ur:CreateTime>2018-05-09T09:06:52Z</ur:CreateTime>
<ur:Site>EMBL-EBI</ur:Site>
<ur:Infrastructure>Embassy</ur:Infrastructure>
</ur:RecordIdentityBlock>
<ur:SubjectIdentityBlock>
<ur:LocalUserId>stain</ur:LocalUserId>
<ur:LocalGroupId>ELIXIRCWLImplStudy</ur:LocalGroupId>
<ur:GlobalUserId>https://orcid.org/0000-0001-9842-9718</ur:GlobalUserId>
</ur:SubjectIdentityBlock>
<ur:ComputeUsageBlock>
<ur:CpuDuration>PT3600S</ur:CpuDuration>
<ur:WallDuration>PT3600S</ur:WallDuration>
<ur:StartTime>2018-05-31T11:00:00</ur:StartTime>
<ur:EndTime>2018-05-31T12:00:00</ur:EndTime>
<ur:ExecutionHost>
<ur:Hostname>compute-0-1.example.com</ur:Hostname>
<ur:ProcessId>1042</ur:ProcessId>
<ur:Benchmark ur:type="si2k">3.14</ur:Benchmark>
</ur:ExecutionHost>
<ur:Processors>4</ur:Processors>
<ur:NodeCount>1</ur:NodeCount>
</ur:ComputeUsageBlock>
<ur:JobUsageBlock>
<ur:GlobalJobId>host.example.org/ab1234</ur:GlobalJobId>
<ur:LocalJobId>ab1234</ur:LocalJobId>
<ur:JobName>MetaGenomics1337</ur:JobName>
<ur:Queue ur:description="execution">"Bigmem"</ur:Queue>
<ur:TimeInstant ur:type="Ctime">2018-05-31T10:30:00</ur:TimeInstant>
<ur:TimeInstant ur:type="Qtime">2018-05-31T10:31:00</ur:TimeInstant>
<ur:TimeInstant ur:type="Etime">2018-05-31T10:59:42</ur:TimeInstant>
</ur:JobUsageBlock>
<ur:MemoryUsageBlock>
<ur:MemoryClass>"RAM"</ur:MemoryClass>
<ur:MemoryResourceCapacityUsed>14728</ur:MemoryResourceCapacityUsed>
<ur:MemoryResourceCapacityAllocated>56437</ur:MemoryResourceCapacityAllocated>
<ur:MemoryResourceCapacityRequested>42000</ur:MemoryResourceCapacityRequested>
<ur:StartTime>2018-05-31T11:00:00</ur:StartTime>
<ur:EndTime>2018-05-31T12:00:00</ur:EndTime>
</ur:MemoryUsageBlock>
<ur:StorageUsageBlock>
<ur:StorageShare>pool-003</ur:StorageShare>
<ur:StorageMedia>disk</ur:StorageMedia>
<ur:StorageClass>replicated</ur:StorageClass>
<ur:DirectoryPath>/projectA</ur:DirectoryPath>
<ur:FileCount>42</ur:FileCount>
<ur:StorageResourceCapacityUsed>14728</ur:StorageResourceCapacityUsed>
<ur:StorageLogicalCapacityUsed>13617</ur:StorageLogicalCapacityUsed>
<ur:StorageResourceCapacityAllocated>14624
</ur:StorageResourceCapacityAllocated>
<ur:StartTime>2018-05-07T09:31:40Z</ur:StartTime>
<ur:EndTime>2018-05-08T09:29:42Z</ur:EndTime>
<ur:Host>host.example.org</ur:Host>
</ur:StorageUsageBlock>
<ur:CloudUsageBlock>
<ur:LocalVirtualMachineId>ab1234</ur:LocalVirtualMachineId>
<ur:GlobalVirtualMachineId>
host.example.org/ab1234/2018-05-09T09:06:52Z
</ur:GlobalVirtualMachineId>
<ur:Status>started</ur:Status>
<ur:SuspendDuration>PT3600S</ur:SuspendDuration>
<ur:ImageId>UbuntuImage2013</ur:ImageId>
<ur:MachineName>cloud.example.org</ur:MachineName>
<ur:SubmitHost>
cloud-name=cloud.example.org,Mds-Vo-name=local,o=cloud
</ur:SubmitHost>
<ur:TimeInstant ur:type="Ctime">2018-05-31T10:30:00</ur:TimeInstant>
<ur:TimeInstant ur:type="Qtime">2018-05-31T10:31:00</ur:TimeInstant>
<ur:TimeInstant ur:type="Etime">2018-05-31T10:59:42</ur:TimeInstant>
<ur:ServiceLevel>Premium</ur:ServiceLevel>
</ur:CloudUsageBlock>
<ur:NetworkUsageBlock>
<ur:NetworkClass ur:NetworkResourceBandwidth="100000000">"Ethernet"</ur:NetworkClass>
<ur:NetworkInboundUsed ur:SourceAddress=192.168.1.12>14728</ur:NetworkInboundUsed>
<ur:NetworkOutboundUsed ur:DestinationAddress=192.168.1.21>14728</ur:NetworkOutboundUsed>
</ur:NetworkUsageBlock>
</ur:UsageRecord>
Collect detailed usage statistics from compute environment
Pau Ruiz Safont
outputs:
sequence:
type: stdout
format: edam:format_1929 # FASTA
cwlprov:concept: ebi_metagenomics:assembly_statistics
cwlprov:relationships:
prov:wasDerivedFrom: [ inputs.second_input, outputs.second_output ]
{
"uri": "urn:hash::sha1:b9214658cc453331b62c2282b772a5c063dbd284",
"bundledAs": {
"uri": "arcp://uuid,1f767ad4-ac52-4623-b5bc-dd9faf2b869f/data/b9/b9214658cc453331b62c2282b772a5c063dbd284",
"folder": "/data/b9/",
"filename": "b9214658cc453331b62c2282b772a5c063dbd284"
},
"format": "http://edamontology.org/format_1929",
"classifying": "ebi_metagenomics:assembly_statistics",
"prov:wasDerivedFrom": [
"urn:hash::sha1:f572d396fae9206628714fb2ce00f72e94f2258f",
"urn:hash::sha1:2258ff572d396fae9206628714fb2ce00f72e94f",
]
},
postprocessing of Research Object
outputs:
sequence:
type: stdout
biocompute:error_domain: [
"frequency_cutoff > 0.05"
],
biocompute:input_domain: {
"genomic_reference": "WAP_RAT"
}
{ manifest }
{ prov }
{ cwl }
Adam Cowdy