Hetnets + Biomedicine + Neo4j = Hetionet

Datathon Workshop

DataPhilly Meetup

July 5, 2016

Daniel Himmelstein (@dhimmel)

Greene Lab at Penn (@GreeneLab)

Can we make a machine learn all biomedical knowledge?

Networks encode knowledge…

Hetnets encode diverse knowledge







Pharmacologic Classes

Side Effects





Molecular Functions

Biological Processes

Cellular Components

Hetionet v1.0

  • 1,552 small molecule compounds
  • 137 complex diseases
  • 755 treatments
  • 47,031 nodes of 11 types
  • 2,250,197 edges of 24 types
  • 29 resources
  • millions of studies from last half century
  • Hetionet integrates data from 29 resources
  • 12 had an open license
  • 9 had no license
  • Incompatibilities - Share Alike vs Non-Commercial
  • Requested permission for 11 resources
  • Median time to first reponse was 16 days
  • 2 affirmative responses
  • Removed MSigDB
  • "LICENSEE agrees not to put … the DATABASE on a … server … that may be accessed by any individual other than the LICENSEE."
  • LICENSEE agrees to provide … a written evaluation of the PROGRAM and the DATABASE, including a description of its functionality or problems and areas for further improvement

Legal barriers to data reuse


release data under an open license

Slide courtesy of Nicole White (@_nicolemargaret)

Slide courtesy of Nicole White (@_nicolemargaret)

Slide courtesy of Nicole White (@_nicolemargaret)

Cypher Query Language

So what does Hetionet know…

Can we predict new uses for existing drugs?

209,168 Predictions (het.io/repurpose)

Hetionet browser at neo4j.het.io

MATCH path = (n0:BiologicalProcess)
WHERE n0.name = 'myelination'

Can answer versatile questions such as:

What compounds that target genes involved in myelation?

Go to neo4j.het.io & try to:

  1. Retrieve the Disease node named "lung cancer"
  2. Find which anatomies (tissue types) where lung cancer localizes (LOCALIZES_DlA)
  3. Find all genes associated with "spinal cancer" (ASSOCIATES_DaG)
  4. Find all genes associated with both "liver cancer" and "kidney cancer"
  5. Find all genes that participate in the "mitotic spindle checkpoint" BiologicalProcess (PARTICIPATES_GpBP)
  6. Find all genes that participate in the "mitotic spindle checkpoint" and are expressed in the lung (EXPRESSES_AeG)

Solution queries

  1. MATCH (node:Disease {name: "lung cancer"}) RETURN node
  2. MATCH (:Disease {name: 'lung cancer'})-[rel:LOCALIZES_DlA]->() RETURN rel
  3. MATCH (:Disease {name: 'spinal cancer'})-[r:ASSOCIATES_DaG]->() RETURN r
  4. MATCH path=(:Disease {name: 'liver cancer'})-[:ASSOCIATES_DaG]-(:Gene)-[:ASSOCIATES_DaG]-(:Disease {name: 'kidney cancer'}) RETURN path
  5. MATCH ({name: 'mitotic spindle checkpoint'})-[rel:PARTICIPATES_GpBP]-() RETURN rel
  6. MATCH path=(:BiologicalProcess {name: 'mitotic spindle checkpoint'})-[:PARTICIPATES_GpBP]-(gene:Gene)-[:EXPRESSES_AeG]-(:Anatomy {name: 'lung'}) RETURN path

Project Cognoma

Hetionet will be one of several components in Project Cognoma

Putting machine learning in the hands of cancer biologists

Thanks Neo4j for the pizza

DataPhilly Datathon Workshop · Hetnets + Biomedicine + Neo4j = Hetionet

By Daniel Himmelstein

DataPhilly Datathon Workshop · Hetnets + Biomedicine + Neo4j = Hetionet

This presentation introduces Hetionet and Neo4j for the datathon workshop organized by the DataPhilly and Code for Philly meetups on July 5, 2016. At the end, we introduce Project Cognoma, which will be the focus of future hack nights.

  • 3,933