Open PHACTS

BioExcel use case #6

Leveraging integrated pharmacological

datasets for cross-domain queries

Stian Soiland-Reyes, University of Manchester

http://orcid.org/0000-0001-9842-9718

@soilandreyes

BioExcel WP2 meeting, Schiphol, 2016-02-11

This work has been done as part of the BioExcel CoE (www.bioexcel.eu),

a project funded by the EC H2020 program, EINFRA-5-2015 contract number 675728

Bringing together pharmacological data resources

in an integrated, interoperable infrastructure

Data sources integrated and linked together so that you can easily see the relationships between compounds, targets, pathways, diseases and tissues.

ChEBI, ChEMBL, ChemSpider, ConceptWiki, DisGeNET, DrugBank, FAERS, Gene Ontology, neXtProt, UniProt, WikiPathways

Web  explorer.openphacts.org

REST API  dev.openphacts.org

Workflows myexperiment.org
    (KNIME, Pipeline Pilot, Taverna)

Third-party applications

 

Docker github.com/openphacts/ops-docker

Download data.openphacts.org

Using Open PHACTS

Example workflow

For a given compound, give me the interaction profile with targets

Workflow gap analysis

Combining private and public data ➟ Tutorials

Local install of Open PHACT ➟ Docker

Testing and comparing different APIs

Integrate third-party tools ➟ Identifier mappings

API changes ➟ Semantic versioning

 

Lack of flexibility

Workflow language, API URLs, Data sources

Common format for bioinformatics tool execution
Community based standards effort, not specific software
Implement CWL in your own workflow engine
Defined with a schema, specification & test suite
Designed for shared-nothing cluster & cloud environments
Designed for containers (e.g. Docker)

Main focus: command line tools

Adapted from  Peter Amstutz' Broad Institute CWL meetup 2015-11-13

class: CommandLineTool
inputs:
  - id: "#infile"
    type: {type: array, items: File}
    inputBinding: {position: 1}
outputs:
  - id: "#outfile"
    type: File
    outputBinding: {glob: "out.txt"}
baseCommand: ["wc", -l]
stdout: out.txt
#!/usr/bin/env cwl-runner
class: Workflow
requirements:
  - class: ScatterFeatureRequirement
  - class: DockerRequirement
    dockerPull: "debian:8"

inputs:
  - id: "#pattern"
    type: string
  - id: "#infile"
    type: {type: array, items: File}

steps:
  - id: "#grep"
    run: {import: grep.cwl.yaml}
    scatter: "#grep.infile"
    inputs:
      - id: "#grep.infile"
        source: "#infile"
      - id: "#grep.pattern"
        source: "#pattern"
    outputs:
      - id: "#grep.outfile"


  - id: "#wc"
    run: {import: wc.cwl.yaml}
    inputs:
      - id: "#wc.infile"
        source: "#grep.outfile"
    outputs:
      - id: "#wc.outfile"

outputs:
  - id: "#outfile"
    type: File
    source: "#wc.outfile"
class: CommandLineTool
inputs:
  - id: "#pattern"
    type: string
    inputBinding: {position: 0}
  - id: "#infile"
    type: File
    inputBinding: {position: 1}
outputs:
  - id: "#outfile"
    type: File
    outputBinding: {glob: "out.txt"}
baseCommand: "grep"
stdout: out.txt

Rich: Linked Data allows for infinite metadata annotations and reasoning

Runnable: not just an abstract sketch; runs in containers, clouds, & (HPC) clusters

Portable: both reference & vendor implementations available

Community Driven: started at the Bioinformatics Open Source Conference; lazy consensus & do-ocracy approach

Adapted from Michael R Crusoe's #CommonWL talk at @scilifelab https://goo.gl/fR2shQ 

Implementers

cwltool reference impl.
Rabix
Arvados
Galaxy
Parallel Recipes
Toil
CancerCollaboratory
Airflow (SciDAP)

cwl2script

Apache Taverna planning

How can BioExcel help Open PHACTS?

Making Open PHACTS workflow components

swagger.json → CWL tools

"Call API" command line

Docker/CWL support in KNIME/Taverna/Galaxy

Improve local deployment of Open PHACTS

Virtualization and data management

Automate cloud installation

 

Tutorials and training

Community building