exposing toxcast as a friendly OPentox api
Talk outline
- What problems does it solve?
- Demo
- Tools used & lessons learned
- Next steps
How we organize data in general
- Encyclopedias - great for many things, but
- slow to look up just one thing
- updating means republishing everything
- every language is a new set of books
Downsides of data as zip files
- Even simple code needs non-trivial parsing (ToxCast: 20 csv files). Slows down development e.g. machine learning.
- Parsing code must be re-implemented in every language
- Data acquisition often manual (not automatically reproducible)
- Annoying to find overlapping compounds tested in several databases (implement N parsers, harmonize, ...)
- Updating means republishing everything
toxcast as an opentox api
- JSON over HTTP (REST)
- Can be accessed with a browser over the internet
- Or consumed from workflow tools, machine learning software, ...
- Has rich filtering - only query the data you need, get it instantaneously
Let's take a look
What about a nice data browser?
(work in progress)
Use Cases
- Get data into KNIME directly from the Api (DEMO!)
- Query compounds in toxcast from code (DEMO!)
- Find compounds common in two data sources (DEMO!)
Let's recap
Advantages of data APIs
- Instant access to data / metadata (No unzipping, parsing...)
- Works with any programming language / workflow tool
- Works over standard internet protocols (passes firewalls)
- Same code can be used to expose data publicly or within an institiution
Behind the scenes: Zipfile ⇨ Api
- Write OpenAPi/Swagger definition
- Write small importer to download official zip, store it in datastore (currently Elastic Search for ToxCast and ToxRefDB)
- Generate scaffold for API with swagger tools (in our case python flask)
- Implement API (ToxCast & ToxRef: about 150 LOC of python)
- Write docker files & kubernetes descriptions for easy deployment and sharing
Ontologies
- Use x- extension syntax in swagger definition to annotate JSON result fields with ontology terms. This will help with search and matching compatible data / modelling services
definitions:
Compound:
type: object
properties:
chid:
type: integer
description: "Internal identifier of compounds within Toxcast"
chnm:
type: string
description: "Chemical name (as stored in ToxCast)"
casn:
type: string
description: "CAS Number as stored in Toxcast. Can be empty string if no valid CAS number is stored in ToxCast."
x-ontology: http://edamontology.org/data_3102
We also realized
- Planning for interactive discovery is crucial
- Precise configuration via URLs (data filtering, ...) is great to pass through to other services
What is next for us
- Finish ToxRefDB, ToxCast, OpenTGGates APIs
- Collect feedback, iterate and improve
- Build on and extend integration with CPSign modelling service as a case study
- Build a web interface that allows non-programmers to build APIs from CSV files
- Build a central discovery service so that compatible data sources, modelling services and utilities can be found and matched automatically accross the internet
We could use your help :-)
- Play with it!
- Let us know what else you need
- If you have the know how, implement another data source as a compatible API (we will gladly help!)
- Tell others about it - standards work best when many people know about them
Let's make data access easy!
thank you!
Exposing ToxCast as a friendly OpenTox API
By Daniel Bachler
Exposing ToxCast as a friendly OpenTox API
Presentation at the OpenTox Euro 2016
- 2,587