exposing toxcast as a friendly OPentox api

by Daniel Bachler (Douglas Connect)

daniel@douglasconnect.com

(use space to proceed through the presentation)

Talk outline

What problems does it solve?
Demo
Tools used & lessons learned
Next steps

How we organize data in general

Encyclopedias - great for many things, but
slow to look up just one thing
updating means republishing everything
every language is a new set of books

Downsides of data as zip files

Even simple code needs non-trivial parsing (ToxCast: 20 csv files). Slows down development e.g. machine learning.
Parsing code must be re-implemented in every language
Data acquisition often manual (not automatically reproducible)
Annoying to find overlapping compounds tested in several databases (implement N parsers, harmonize, ...)
Updating means republishing everything

toxcast as an opentox api

JSON over HTTP (REST)
Can be accessed with a browser over the internet
Or consumed from workflow tools, machine learning software, ...
Has rich filtering - only query the data you need, get it instantaneously

Let's take a look

http://toxcast-api.cloud.douglasconnect.com/beta/ui

http://toxREFDB-api.cloud.douglasconnect.com/beta/ui

What about a nice data browser?

http://opentox-data-explorer.cloud.douglasconnect.com

(work in progress)

Use Cases

Get data into KNIME directly from the Api (DEMO!)
Query compounds in toxcast from code (DEMO!)
Find compounds common in two data sources (DEMO!)

Let's recap

Advantages of data APIs

Instant access to data / metadata (No unzipping, parsing...)
Works with any programming language / workflow tool
Works over standard internet protocols (passes firewalls)
Same code can be used to expose data publicly or within an institiution

Behind the scenes: Zipfile ⇨ Api

Write OpenAPi/Swagger definition
Write small importer to download official zip, store it in datastore (currently Elastic Search for ToxCast and ToxRefDB)
Generate scaffold for API with swagger tools (in our case python flask)
Implement API (ToxCast & ToxRef: about 150 LOC of python)
Write docker files & kubernetes descriptions for easy deployment and sharing

Ontologies

Use x- extension syntax in swagger definition to annotate JSON result fields with ontology terms. This will help with search and matching compatible data / modelling services

definitions:
  Compound:
    type: object
    properties:
      chid:
        type: integer
        description: "Internal identifier of compounds within Toxcast"
      chnm:
        type: string
        description: "Chemical name (as stored in ToxCast)"
      casn:
        type: string
        description: "CAS Number as stored in Toxcast. Can be empty string if no valid CAS number is stored in ToxCast."
        x-ontology: http://edamontology.org/data_3102

We also realized

Planning for interactive discovery is crucial
Precise configuration via URLs (data filtering, ...) is great to pass through to other services

What is next for us

Finish ToxRefDB, ToxCast, OpenTGGates APIs
Collect feedback, iterate and improve
Build on and extend integration with CPSign modelling service as a case study
Build a web interface that allows non-programmers to build APIs from CSV files
Build a central discovery service so that compatible data sources, modelling services and utilities can be found and matched automatically accross the internet

We could use your help :-)

Play with it!
Let us know what else you need
If you have the know how, implement another data source as a compatible API (we will gladly help!)
Tell others about it - standards work best when many people know about them

Let's make data access easy!

thank you!

http://opentox-data-explorer.cloud.douglasconnect.com/

http://toxcast-api.cloud.douglasconnect.com/beta/ui

http://toxrefdb-api.cloud.douglasconnect.com/beta/ui