Assays and Workflow Runs in IBISBA Hub

This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement № 730965


IBISBA WP6/WP7 meeting, Manchester, UK

Assays in SEEK

Not (just) an experiment!


After-the-fact provenance

Assays are related







Data files

SOPs Protocols

Investigation Study Assay

(+ Data, Samples, ...)

Assays for different purposes

Assay variants


…a computational protocol?


Performs a computational analysis


Explains a method


Reusable and tweakable


Prospective provenance

#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: Workflow
  rulesfile: File
  sourcefile: File
  sinkfile: File
  reverse: boolean
  max-steps: int?

    type: File
    outputSource: rp2paths/compounds
    type: File
    outputSource: rp2paths/reactions
    type: File
    outputSource: rp2paths/sinks

    run: ../tools/RetroPath2/RetroPath2.cwl
      input.rulesfile: rulesfile
      input.sourcefile: sourcefile
      input.sinkfile: sinkfile
      input.max-steps: max-steps
    out: [solutionfile]

    run: ../tools/rp2paths/rp2paths.cwl
      infile: rp2/solutionfile
      reverse: reverse
    out: [compounds, reactions, sinks]
    - upstream:
      installTo: ../tools/RetroPath2
    - upstream:
      installTo: ../tools/rp2paths

Workflow Node

…a (potential) step of a workflow


Executes some tool (may itself be a workflow)


Explains computational method: "How"


Informs workflow authors - parameters, configurations, versions






Workflow runs

…a kind of Assay?


What data sources? inputs

How did it run? workflow

Where did it run? machine(s)

Which results were created? outputs


Why did it run? study/investigation

What does it mean? evaluation


Retrospective provenance

Selecting data inputs

Initiating a workflow run


What data sources? inputs

How to analyse? workflow

Where to run it? machine(s)

Which results to expect? outputs


Why run it? study/investigation


Did it work?evaluation


Prospective & concurrent provenance!

App Settings

Bowtie2 Indexer (#Bowtie2_Indexer)
Difference-cover period 1024
Disable default parameters False
Disable diff-cover sample False
Discard bitpacked files False
Ftab lookup table size 10
Large index False
Only bitpacked files False
Packed representation False
Rows to mark 5
Seed -
Suffixes -
Suffixes as fraction 4
FastQC (#FastQC)
Casava No value
Format No value
Kmers 7
Nano No value
Nogroup No value
BamTools Index (#BamTools_Index)
BTI format No value
Picard CollectAlignmentSummaryMetrics (#Picard_CollectAlignmentSummaryMetrics)
Assume sorted true
Compression level 5
Is bisulfite sequenced false
Max insert size 100000
Max records in RAM 500000
Memory per job 2048
Metric accumulation level ALL_READS
Quiet false
Stop after 0
Validation stringency SILENT
Verbosity INFO
TopHat2 (#TopHat2)
Allowed mismatch number 0
Ambiguous character penalty 1
Bowtie -n False
Bowtie2 preset Fast
Coefficient B 1.25
Coefficient B 0.15
Coefficient B -0.6
Constant A 1
Constant A 0
Constant A -0.6
Coverage search False
Disable BAM sorting False
Disable discordant alignments False
Disable mixed alignments False
Disallow gaps 4
Function type Square-root
Function type Linear
Function type Linear
Fusion anchor length 20
Fusion ignore chromosomes No value
Fusion minimum distance 10000000
Fusion multipairs 2
Fusion multireads 2
Fusion read mismatches 2
Fusion search False
Keep FASTA order False
Library type fr-unstranded
Mate inner distance 50
Mate standard deviation 20
Max number of re-seed 2
Maximum coverage intron 20000
Maximum deletion length 3
Maximum insertion length 3
Maximum intron length 500000
Maximum multihits 20
Maximum segment intron 500000
Microexon search False
Minimum anchor length 8
Minimum coverage intron 50
Minimum intron length 50
Minimum segment intron 50
Mismatch penalty 6,2
No novel indels False
No novel juncs False
Prefilter multihits False
Read edit distance 2
Read gap length 2
Read gap penalties 5,3
Read mismatches 2
Read realign edit distance "Read edit distance" + 1
Reference gap penalties 5,3
Report secondary alignments False
Seed extension attempts 15
Seed substring length 20
Segment length 25
Segment mismatches 2
Splice mismatches 0
Transcriptome max hits 60
Transcriptome only False

Danger of Parameterisis

Workflow progress

Are all workflow runs valuable assays?


(Broad Institute)

CWL implementations



Copyright © 2013 W3C® (MIT, ERCIM, Keio, Beihang), All Rights Reserved. W3C liability, trademark and document use rules apply.

PROV Model Primer

W3C Working Group Note 30 April 2013


Khan et al,
Sharing interoperable workflow provenance: A review of best practices and their practical application in CWLProv
Submitted to GigaScience

$ cwlprov --help
usage: cwlprov [-h] [--version] [--directory DIRECTORY] [--relative]
            [--absolute] [--output OUTPUT] [--verbose] [--quiet] [--hints]

cwlprov explores Research Objects containing provenance of Common Workflow
Language executions. <>

    validate            Validate the CWLProv Research Object
    info                show research object Metadata
    who                 show Who ran the workflow
    prov                export workflow execution Provenance in PROV format
    inputs              list workflow/step Input files/values
    outputs             list workflow/step Output files/values
    run                 show workflow Execution log
    runs                List all workflow executions in RO
    rerun               Rerun a workflow or step
    derived             list what was Derived from a data item, based on
                        activity usage/generation
    runtimes            calculate average step execution Runtimes

EOSC-Life roadmap

2019-04-04 Assays and Workflow Runs in IBISBA Hub

By Stian Soiland-Reyes

2019-04-04 Assays and Workflow Runs in IBISBA Hub

  • 2,300