Open PHACTS

BioExcel use case #6

Leveraging integrated pharmacological

datasets for cross-domain queries

Stian Soiland-Reyes, University of Manchester

http://orcid.org/0000-0001-9842-9718

@soilandreyes

This work is licensed under a
Creative Commons Attribution 4.0 International License.

BioExcel WP2 meeting, Schiphol, 2016-02-11

This work has been done as part of the BioExcel CoE (www.bioexcel.eu),

a project funded by the EC H2020 program, EINFRA-5-2015 contract number 675728

http://www.openphacts.org/

Bringing together pharmacological data resources

in an integrated, interoperable infrastructure

Data sources integrated and linked together so that you can easily see the relationships between compounds, targets, pathways, diseases and tissues.

ChEBI, ChEMBL, ChemSpider, ConceptWiki, DisGeNET, DrugBank, FAERS, Gene Ontology, neXtProt, UniProt, WikiPathways

http://www.openphacts.org/

Web explorer.openphacts.org

REST API dev.openphacts.org

Workflows myexperiment.org
(KNIME, Pipeline Pilot, Taverna)

Third-party applications

Docker github.com/openphacts/ops-docker

Download data.openphacts.org

Using Open PHACTS

http://www.openphacts.org/

Example workflow

For a given compound, give me the interaction profile with targets

http://www.myexperiment.org/workflows/4509

http://www.openphacts.org/

doi:10.1371/journal.pone.0115460

Workflow gap analysis

Combining private and public data ➟ Tutorials

Local install of Open PHACT ➟ Docker

Testing and comparing different APIs

Integrate third-party tools ➟ Identifier mappings

API changes ➟ Semantic versioning

Lack of flexibility

Workflow language, API URLs, Data sources

http://commonwl.org/

Common format for bioinformatics tool execution
Community based standards effort, not specific software
Implement CWL in your own workflow engine
Defined with a schema, specification & test suite
Designed for shared-nothing cluster & cloud environments
Designed for containers (e.g. Docker)

Main focus: command line tools

Adapted from Peter Amstutz' Broad Institute CWL meetup 2015-11-13

http://commonwl.org/

https://github.com/common-workflow-language/workflows/tree/master/workflows/FestivalDemo

class: CommandLineTool
inputs:
  - id: "#infile"
    type: {type: array, items: File}
    inputBinding: {position: 1}
outputs:
  - id: "#outfile"
    type: File
    outputBinding: {glob: "out.txt"}
baseCommand: ["wc", -l]
stdout: out.txt

#!/usr/bin/env cwl-runner
class: Workflow
requirements:
  - class: ScatterFeatureRequirement
  - class: DockerRequirement
    dockerPull: "debian:8"

inputs:
  - id: "#pattern"
    type: string
  - id: "#infile"
    type: {type: array, items: File}

steps:
  - id: "#grep"
    run: {import: grep.cwl.yaml}
    scatter: "#grep.infile"
    inputs:
      - id: "#grep.infile"
        source: "#infile"
      - id: "#grep.pattern"
        source: "#pattern"
    outputs:
      - id: "#grep.outfile"


  - id: "#wc"
    run: {import: wc.cwl.yaml}
    inputs:
      - id: "#wc.infile"
        source: "#grep.outfile"
    outputs:
      - id: "#wc.outfile"

outputs:
  - id: "#outfile"
    type: File
    source: "#wc.outfile"

class: CommandLineTool
inputs:
  - id: "#pattern"
    type: string
    inputBinding: {position: 0}
  - id: "#infile"
    type: File
    inputBinding: {position: 1}
outputs:
  - id: "#outfile"
    type: File
    outputBinding: {glob: "out.txt"}
baseCommand: "grep"
stdout: out.txt

http://commonwl.org/

Rich: Linked Data allows for infinite metadata annotations and reasoning

Runnable: not just an abstract sketch; runs in containers, clouds, & (HPC) clusters

Portable: both reference & vendor implementations available

Community Driven: started at the Bioinformatics Open Source Conference; lazy consensus & do-ocracy approach

Adapted from Michael R Crusoe's #CommonWL talk at @scilifelab https://goo.gl/fR2shQ

http://commonwl.org/

Implementers

cwltool reference impl.
Rabix
Arvados
Galaxy
Parallel Recipes
Toil
CancerCollaboratory
Airflow (SciDAP)

cwl2script

Apache Taverna planning

How can BioExcel help Open PHACTS?

Making Open PHACTS workflow components

swagger.json → CWL tools

"Call API" command line

Docker/CWL support in KNIME/Taverna/Galaxy

Improve local deployment of Open PHACTS

Virtualization and data management

Automate cloud installation

Open PHACTS

BioExcel use case #6

Bringing together pharmacological data resources

in an integrated, interoperable infrastructure

Using Open PHACTS

Example workflow

Workflow gap analysis

Implementers

How can BioExcel help Open PHACTS?

Making Open PHACTS workflow components

Improve local deployment of Open PHACTS

Tutorials and training

Community building