Hetnets + Biomedicine + Neo4j = Hetionet

Datathon Workshop

DataPhilly Meetup

July 5, 2016

Daniel Himmelstein (@dhimmel)

Greene Lab at Penn (@GreeneLab)

Can we make a machine learn all biomedical knowledge?

Networks encode knowledge…

Hetnets encode diverse knowledge







Pharmacologic Classes

Side Effects





Molecular Functions

Biological Processes

Cellular Components

Hetionet v1.0

  • 1,552 small molecule compounds
  • 137 complex diseases
  • 755 treatments
  • 47,031 nodes of 11 types
  • 2,250,197 edges of 24 types
  • 29 resources
  • millions of studies from last half century
  • Hetionet integrates data from 29 resources
  • 12 had an open license
  • 9 had no license
  • Incompatibilities - Share Alike vs Non-Commercial
  • Requested permission for 11 resources
  • Median time to first reponse was 16 days
  • 2 affirmative responses
  • Removed MSigDB
  • "LICENSEE agrees not to put … the DATABASE on a … server … that may be accessed by any individual other than the LICENSEE."
  • LICENSEE agrees to provide … a written evaluation of the PROGRAM and the DATABASE, including a description of its functionality or problems and areas for further improvement

Legal barriers to data reuse


release data under an open license

Slide courtesy of Nicole White (@_nicolemargaret)

Slide courtesy of Nicole White (@_nicolemargaret)

Slide courtesy of Nicole White (@_nicolemargaret)

Cypher Query Language

So what does Hetionet know…

Can we predict new uses for existing drugs?

209,168 Predictions (het.io/repurpose)

Hetionet browser at neo4j.het.io

MATCH path = (n0:BiologicalProcess)
WHERE n0.name = 'myelination'

Can answer versatile questions such as:

What compounds that target genes involved in myelation?

Go to neo4j.het.io & try to:

  1. Retrieve the Disease node named "lung cancer"
  2. Find which anatomies (tissue types) where lung cancer localizes (LOCALIZES_DlA)
  3. Find all genes associated with "spinal cancer" (ASSOCIATES_DaG)
  4. Find all genes associated with both "liver cancer" and "kidney cancer"
  5. Find all genes that participate in the "mitotic spindle checkpoint" BiologicalProcess (PARTICIPATES_GpBP)
  6. Find all genes that participate in the "mitotic spindle checkpoint" and are expressed in the lung (EXPRESSES_AeG)

Solution queries

  1. MATCH (node:Disease {name: "lung cancer"}) RETURN node
  2. MATCH (:Disease {name: 'lung cancer'})-[rel:LOCALIZES_DlA]->() RETURN rel
  3. MATCH (:Disease {name: 'spinal cancer'})-[r:ASSOCIATES_DaG]->() RETURN r
  4. MATCH path=(:Disease {name: 'liver cancer'})-[:ASSOCIATES_DaG]-(:Gene)-[:ASSOCIATES_DaG]-(:Disease {name: 'kidney cancer'}) RETURN path
  5. MATCH ({name: 'mitotic spindle checkpoint'})-[rel:PARTICIPATES_GpBP]-() RETURN rel
  6. MATCH path=(:BiologicalProcess {name: 'mitotic spindle checkpoint'})-[:PARTICIPATES_GpBP]-(gene:Gene)-[:EXPRESSES_AeG]-(:Anatomy {name: 'lung'}) RETURN path

Project Cognoma

Hetionet will be one of several components in Project Cognoma

Putting machine learning in the hands of cancer biologists

Thanks Neo4j for the pizza