Hetionet Awakens: Integrating all of Biology into a Public Neo4j Database

GraphConnect 2016

San Francisco

October 13, 2016

Market Foyer

2:35 pm – 2:50 pm

By Daniel Himmelstein

@dhimmel

    Hetionet is a hetnet — a network with multiple node and relationship types. Version 1.0 contains 47,031 nodes of 11 types and 2,250,197 relationships of 24 types. Data was integrated from 29 public resources to connect compounds, diseases, genes, anatomies, pathways, biological processes, molecular functions, cellular components, perturbations, pharmacologic classes, drug side effects, and disease symptoms.

    Hetionet was created as part of Project Rephetio, an open science project to systematically identify why drugs work and predict new therapies for drugs. Using advanced Cypher queries, we quantified the network connectivity between drug–disease pairs along 1,206 types of paths. We then used machine learning to predict the probability of treatment for 209,168 compound–disease pairs.

    Hetionet is available online as a public Neo4j database instance. The Hetionet Neo4j Browser includes an introductory guide as well as guides showing the most supportive paths for each of the 209,168 predictions. The Hetionet Browser uses Docker for Neo4j. Join us at GraphConnect to learn about how Neo4j is a powerful technology for human disease research.

How do you teach a computer biology?

multilayer network, multiplex network, multivariate network, multinetwork, multirelational network, multirelational data, multilayered network, multidimensional network, multislice network, multiplex of interdependent networks, hypernetwork, overlay network, composite network, multilevel network, multiweighted graph, heterogeneous network, multitype network, interconnected networks, interdependent networks, partially interdependent networks, network of networks, coupled networks, interconnecting networks, interacting networks, heterogenous information network

networks with multiple node or relationship types

A 2012 Study identified 26 different names for this type of network:

hetnet

What's the best software for storing and querying hetnets?

dhimmel/hetio
86
5
2
neo4j/neo4j
42,498
3,071
1,007

GitHub stats from 2016-10-09

  • Hetnet of biology designed for drug repurposing
     
  • ~50 thousand nodes
    11 types (labels)
     
  • ~2.25 million relationships
    24 types
     
  • integrates 29 public resources
    knowledge from millions of studies
     
  • the hardest part:
    licensing of publicly available data

Hetionet v1.0

MetaGraph / Data Model / Schema

Visualizing Hetionet v1.0

  • Customized Docker image
  • Digital Ocean droplet
  • SSL from Let's Encrypt
  • readonly mode with a query execution timeout
  • Custom GRASS style
  • Custom guides

Public Hetionet Neo4j Instance

Details at doi.org/brsc

Project Rephetio: drug repurposing predictions

  • Hetionet v1.0 contains:
    • 1,538 connected compounds
    • 136 connected diseases
    • 209,168 compound–disease pairs
    • 755 treatments
       
  • 1,206 compound–disease metapaths with length ≤ 4
     
  • machine learning classifier
     
  • predict the probability of treatment for all 209,168 compound–disease pairs (het.io/repurpose)

Project online at thinklab.com/p/rephetio

Predictions succeed at prioritizing known treatments

Does bupropion treat nicotine dependence?

  • Bupropion was first approved for depression in 1985
     
  • In 1997, bupropion was approved for smoking cessation
     
  • Can we predict this repurposing from Hetionet? The prediction was:

Compound–causes–SideEffect–causes–Compound–treats–Disease

Compound–binds–Gene–binds–Compound–treats–Disease

Compound–binds–Gene–associates–Disease

Compound–binds–Gene–participates–Pathway–participates–Disease

See in the Neo4j Browser

MATCH path = (n0:Compound)-[:BINDS_CbG]-(n1)-[:PARTICIPATES_GpPW]-
  (n2)-[:PARTICIPATES_GpPW]-(n3)-[:ASSOCIATES_DaG]-(n4:Disease)
USING JOIN ON n2
WHERE n0.name = 'Bupropion'
  AND n4.name = 'nicotine dependence'
  AND n1 <> n3
WITH
[
  size((n0)-[:BINDS_CbG]-()),
  size(()-[:BINDS_CbG]-(n1)),
  size((n1)-[:PARTICIPATES_GpPW]-()),
  size(()-[:PARTICIPATES_GpPW]-(n2)),
  size((n2)-[:PARTICIPATES_GpPW]-()),
  size(()-[:PARTICIPATES_GpPW]-(n3)),
  size((n3)-[:ASSOCIATES_DaG]-()),
  size(()-[:ASSOCIATES_DaG]-(n4))
] AS degrees, path
RETURN
  path,
  reduce(pdp = 1.0, d in degrees| pdp * d ^ -0.4) AS path_weight
ORDER BY path_weight DESC
LIMIT 10

Cypher query to find the top CbGbPWaD paths

Content Type URL
Hetionet Neo4j Browser Neo4j Instance neo4j.het.io
Cypher Tutorial for Project Rephetio GraphGist goo.gl/nO7wbU
Graphistania Podcast Interview goo.gl/yqVhZz
Thinklab Project Lab Notebook thinklab.com/p/rephetio
Hetionet GitHub Repository git.io/vPa98
More Cypher queries on Hetionet Discussion doi.org/brsd
My PhD Thesis Seminar on Hetnets Video youtu.be/H8DfXop8K7g

A special thanks to:

  • project team members: Antoine Lizee, Pouya Khankhanian, Leo Brueggeman, Sabrina Chen, Dexter Hadley, Chrissy Hessler, Ari Green, Sergio Baranzini
  • 35 community contributors on Thinklab
  • Baranzini Lab (for PhD) & Greene Lab (current)
  • Neo4j for their amazing support on GitHub, StackOverflow, Slack, & Meetups. Particularly Nicole White, Michael Hunger, Ryan Boyd, Rik Van Bruggen, Oskar Hane, Christophe Willemsen, William Lyon, stdob, and more.

Project Links & Acknowledgements

Made with Slides.com