Leveraging integrated pharmacological
datasets for cross-domain queries
This work is licensed under a
Creative Commons Attribution 4.0 International License.
BioExcel WP2 meeting, Schiphol, 2016-02-11
This work has been done as part of the BioExcel CoE (www.bioexcel.eu),
a project funded by the EC H2020 program, EINFRA-5-2015 contract number 675728
Data sources integrated and linked together so that you can easily see the relationships between compounds, targets, pathways, diseases and tissues.
ChEBI, ChEMBL, ChemSpider, ConceptWiki, DisGeNET, DrugBank, FAERS, Gene Ontology, neXtProt, UniProt, WikiPathways
REST API dev.openphacts.org
Workflows myexperiment.org
(KNIME, Pipeline Pilot, Taverna)
Third-party applications
Docker github.com/openphacts/ops-docker
Download data.openphacts.org
For a given compound, give me the interaction profile with targets
Combining private and public data ➟ Tutorials
Local install of Open PHACT ➟ Docker
Testing and comparing different APIs
Integrate third-party tools ➟ Identifier mappings
API changes ➟ Semantic versioning
Lack of flexibility
Workflow language, API URLs, Data sources
Common format for bioinformatics tool execution
Community based standards effort, not specific software
Implement CWL in your own workflow engine
Defined with a schema, specification & test suite
Designed for shared-nothing cluster & cloud environments
Designed for containers (e.g. Docker)
Main focus: command line tools
Adapted from Peter Amstutz' Broad Institute CWL meetup 2015-11-13
class: CommandLineTool
inputs:
- id: "#infile"
type: {type: array, items: File}
inputBinding: {position: 1}
outputs:
- id: "#outfile"
type: File
outputBinding: {glob: "out.txt"}
baseCommand: ["wc", -l]
stdout: out.txt
#!/usr/bin/env cwl-runner
class: Workflow
requirements:
- class: ScatterFeatureRequirement
- class: DockerRequirement
dockerPull: "debian:8"
inputs:
- id: "#pattern"
type: string
- id: "#infile"
type: {type: array, items: File}
steps:
- id: "#grep"
run: {import: grep.cwl.yaml}
scatter: "#grep.infile"
inputs:
- id: "#grep.infile"
source: "#infile"
- id: "#grep.pattern"
source: "#pattern"
outputs:
- id: "#grep.outfile"
- id: "#wc"
run: {import: wc.cwl.yaml}
inputs:
- id: "#wc.infile"
source: "#grep.outfile"
outputs:
- id: "#wc.outfile"
outputs:
- id: "#outfile"
type: File
source: "#wc.outfile"
class: CommandLineTool
inputs:
- id: "#pattern"
type: string
inputBinding: {position: 0}
- id: "#infile"
type: File
inputBinding: {position: 1}
outputs:
- id: "#outfile"
type: File
outputBinding: {glob: "out.txt"}
baseCommand: "grep"
stdout: out.txt
Rich: Linked Data allows for infinite metadata annotations and reasoning
Runnable: not just an abstract sketch; runs in containers, clouds, & (HPC) clusters
Portable: both reference & vendor implementations available
Community Driven: started at the Bioinformatics Open Source Conference; lazy consensus & do-ocracy approach
Adapted from Michael R Crusoe's #CommonWL talk at @scilifelab https://goo.gl/fR2shQ
cwltool reference impl.
Rabix
Arvados
Galaxy
Parallel Recipes
Toil
CancerCollaboratory
Airflow (SciDAP)
cwl2script
Apache Taverna planning
swagger.json → CWL tools
"Call API" command line
Docker/CWL support in KNIME/Taverna/Galaxy
Virtualization and data management
Automate cloud installation