Research Objects, Galaxy and Python

Stian Soiland-Reyes, Norman Morrison, Carole Goble

eScience lab, University of Manchester

@soilandreyes

http://orcid.org/0000-0001-9842-9718
http://slides.com/soilandreyes/
 

 

2016-11-10

What is in a Research Object?

A Research Object bundles and relates digital resources of a scientific experiment or investigation:

 

Data used and results produced in experimental study

Methods employed to produce and analyse that data

Provenance and settings for the experiments

People involved in the investigation

Annotations about these resources, to improve understanding and interpretation

id:        doi:10.15490/seek.1.investigation.56
createdOn: 2015-07-10T16:46:00Z
createdBy: http://orcid.org/0000-0001-9842-9718

aggregates:
 - id:         /sequence/specimen5.bam
   conformsTo: http://gemrb.org/iesdp/file_formats/ie_formats/bam_v1.htm

 - id:         http://example.com/blog/about-specimen5
   authoredBy: http://orcid.org/0000-0001-7066-3350

 - id:         http://www.myexperiment.org/workflows/3355
   history:    provenance/workflow-evolution.ttl

annotations:
 - about:       /sequence/specimen5.bam
   content:     annotations/specimen5-properties.jsonld
   createdBy:   http://orcid.org/0000-0001-7066-3350

 - about:       /sequence/specimen5.bam
   content:     http://example.com/blog/about-specimen5
   motivatedBy: oa:questioning

Apache Taverna: Data Bundle

Data Bundle API

SCUFL2 Workflow Bundle

Manifest

Annotation

Annotation

Annotation

Workflow

Workflow

Workflow

Profile

Profile

Tool Config

Tool Config

Tool Config

Tool Config

main workflow

main profile

Provenance

Docker image

CWL tool desc

Example

Run

CWL tool desc

Reference

Data

Docker image

application/vnd.taverna.scufl2.workflow-bundle
#!/usr/bin/env python3
from rolib.bundle import *
from rolib.manifest import *
import datetime

with Bundle("test.bundle.zip",mode='w') as ro:
    ro.manifest.createdOn = datetime.datetime.now().isoformat()
    # Stian created the research object (this collection)
    stian = Agent(name="Stian Soiland-Reyes", orcid="http://orcid.org/0000-0001-9842-9718")
    ro.manifest.createdBy = stian

    ro.writestr("hello.txt", "To be, or not to be, that is the question")
    hello = ro.manifest.get_aggregate("hello.txt")

    # Stian created the hello.txt resource
    hello.createdBy = stian
    hello.createdOn = datetime.datetime.now().isoformat()
    ## but someone else authored its content:
    shakespeare = Agent(name="William Shakespeare", uri="http://dbpedia.org/page/William_Shakespeare")
    hello.authoredBy = shakespeare
    hello.authoredOn = datetime.datetime(1604,1,1).isoformat()

    # Aggregate an external resource, also different author
    quote = ro.manifest.add_aggregate("http://www.folgerdigitaltexts.org/?chapter=5&play=Ham&loc=line-3.1.64")
    quote.authoredBy = shakespeare

    # The digital representation was made by Folger Shakespeare Library
    folger = Agent(name="Folger Shakespeare Library", uri="http://www.folgerdigitaltexts.org/?chapter=0&?target=credit")
    quote.createdBy = folger


    # This wikipage (which we didn't need to aggregate) is somewhat about this quote
    ro.manifest.add_annotation(about=quote.uri, content="https://en.wikipedia.org/wiki/To_be,_or_not_to_be")

http://www.commonwl.org/

cwlVersion: v1.0
class: Workflow
inputs:
  inp: File
  ex: string

outputs:
  classout:
    type: File
    outputSource: compile/classfile

steps:
  untar:
    run: tar-param.cwl
    in:
      tarfile: inp
      extractfile: ex
    out: [example_out]

  compile:
    run: arguments.cwl
    in:
      src: untar/example_out
    out: [classfile]

CWL provenance/RO?

class: WorkflowRun                          # type
id: "#run1"
process: wf.yaml                            # describedByProcess
enactedBy: http://example.org/engine
inputs:                                     # reverse of usedInput
  - {port: wf.yaml#i1, value: 1}            # port = describedByParameter
outputs:                                    # reverse of wasOutputFrom
  - {port: wf.yaml#o1, value: 2}
runs:                                       # reverse of wasPartOfWorkflowRun
  - class: ProcessRun
    id: "#run1.s1"
    process: wf.yaml#s1                     # Step, not tool.
    enactedBy: http://example.org/engine
    inputs:
      - {port: wf.yaml#s1.i1, value: 1}
    outputs:
      - {port: wf.yaml#s1.o1, value: 2}

Proposed Agenda

  • Brainstorming: What are MUST/SHOULD/MAY to include? -> desired Research Object Profile
  • What is easy to export, what is harder?
  • What can be proprietary format, what should be formalized
  • Link to Common Workflow Language?
  • Gap analysis of existing APIs (Java, Python and command line for creating Research Objects
    • Platform choices -- Python? JSON + Command Line?
    • Packaging choices: BagIt or RO Bundle?
  • Where does the RO go? How is it to be consumed?