The hetnet awakens: understanding disease through data integration & open science

2:00–3:00 PM, April 26, 2016

Genentech Hall Auditorium

University of California, San Francisco



Created on using deep neural networks

Sandler Neurosciences Center


What can I tell about my Facebook Friends, just knowing their mutual friendships?

Disease network from GWAS

Khankhanian et al. (2016) Int J Epidemiol. DOI: bfmj

Hetio Disease Genes

Himmelstein & Baranzini (2015) PLOS Comp Bio. DOI: 98q

network of pathogenesis:


  • integrates diverse data
  • 40,343 nodes of 18 types
  • 1,608,168 edges of 19 types

type is essential when operating on hetnets

metapath-based approach

feature extraction: the DWPC

mechanisms of pathogenesis:

comparing gene set collections

Project Rephetio

Himmelstein, Lizee, Khankhanian, Brueggeman, Chen, Hessler, Green Hadley & Baranzini (2015–201?) Thinklab. DOI: 993

Metagraph in 2013

Metagraph in 2016



Pharmacologic Classes

Side Effects





Molecular Functions

Biological Processes

Cellular Components

Hetionet v1.0

  • 1,552 small molecule compounds
  • 137 complex diseases
  • 755 treatments
  • 47,031 nodes of 11 types
  • 2,250,197 edges of 24 types
  • 29 resources
  • millions of studies from last half century
  • 1,206 types of paths
  • 209,168 potential treatments
    (1,538 compounds × 136 complex diseases)
  • 6 hetnets (5 permuted)
  • ~60 million database queries
  • predictive model of efficacy

Machine Learning Paradigm

Greene & Himmelstein (2016) Circ Cardiovasc Genet. DOI: bffr

SNPlentiful Effect

Lung Cancer Rates, 2007–2011

Simeonov & Himmelstein (2015) PeerJ. DOI: 98p

Trouble at the Town Hall

1908 in Silverton, CO

San Juan Historical Society

San Juan County, Colorado

3,473 m

−35% O₂



Time from submission to acceptance for 3,330,333 articles since 1965

Visualization by Antoine Lizee

I am a lawyer, but not your lawyer (or UC’s lawyer), and this isn’t legal advice.

― Katie Fortney

Others, like myself, try to remember to rate everything that I've read.

Lars Jensen

It took me a while to figure out.

Antoine Lizee

I would like to enter into the discussion the cases where there was a tough decision to be made

Pouya Khankhanian

117 Ratings

1504 Word Post

2776 Word Post




Nice of you to share this big network with everyone; however, I think you need to take care not to get yourself into legal trouble here. … 

I am not trying to cause trouble here — just the contrary. When making a meta-resource, licenses and copyright law are not something you can afford to ignore. I regularly leave out certain data sources from my resources for legal reasons.

One network to rule them all

We have completed an initial version of our network. …

Network existence (SHA256 checksum for graph.json.gz) is proven in Bitcoin block 369,898.

Discussion DOIs: bfmkbfmmbfmnbfmp

  • Hetionet integrates data from 29 resources
  • 12 had an open license
  • 9 had no license
  • Incompatibilities - Share Alike vs Non-Commercial
  • Requested permission for 11 resources
  • Median time to first reponse was 16 days
  • 2 affirmative responses
  • Removed MSigDB
  • "LICENSEE agrees not to put … the DATABASE on a … server … that may be accessed by any individual other than the LICENSEE."
  • LICENSEE agrees to provide … a written evaluation of the PROGRAM and the DATABASE, including a description of its functionality or problems and areas for further improvement

Legal barriers to data reuse


release data under an open license


This material is based upon work supported by the National Science Foundation Graduate Research Fellowship Program under Grant No. 1144247. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation.

Jesse Spaulding

Creator of Thinklab

Kristin Sainani


George Johnson

Kendall Powell

Simon Oxenham

DNA from my maternal grandmother & susceptibility genes for educational attainment

Nina Gonzaludo

Cyrus Maher

Mason Louie

María Chavez

Leo Brueggeman

Sabrina Chen

Presentation online at:


Reception & Exhibit to Follow:

  • Sandler Neurosciences Center, 2nd Floor


Thesis Seminar · Daniel Himmelstein · Biological & Medical Informatics · UCSF

By Daniel Himmelstein

Thesis Seminar · Daniel Himmelstein · Biological & Medical Informatics · UCSF

Daniel Himmelstein's Thesis Seminar covering his PhD studies. See the accompanying recording on YouTube ( This presentation is released under a CC BY 4.0 license.

  • 6,478