Open sourceror. Digital craftsman of the biodata revolution.
Approaches in network medicine have traditionally focused on generating insights from graphs with a single type of node and relationship. However, biology's complexity demands a richer network structure capable of integrating diverse, multi-scale information. Towards this end, we develop hetnets — networks with multiple types of nodes and relationships.
Specifically we created Hetionet — a network of biology, disease, and pharmacology. This resource encodes knowledge from millions of biomedical studies over the last half century. Version 1.0 contains 47,031 nodes of 11 types and 2,250,197 relationships of 24 types. We host a public Neo4j database instance at https://neo4j.het.io allowing users to interact with Hetionet.
In Project Rephetio, we applied Hetionet to predict new uses for existing drugs. Our approach learned the network patterns of connectivity that differentiate treatments from non-treatments, enabling us to predict the probability of treatment for 209,168 compound–disease pairs. These predictions prioritize treatments under investigation by clinical trial.
Going forward, we're investigating more efficient algorithms for feature extraction on hetnets. In addition, we're looking to automate hetnet construction by text mining the literature. The success of hetnets will depend on the availability of openly licensed inputs. As such, I'll briefly discuss data analyses I've performed in hopes of making science more open.
How I became intestested in graphs
My Facebook friendship network in 2014
Graphs are composed of:
Nodes / relationships have type:
- node types
(person, course, university)
- relationship types
networks with multiple node or relationship types
multilayer network, multiplex network, multivariate network, multinetwork, multirelational network, multirelational data, multilayered network, multidimensional network, multislice network, multiplex of interdependent networks, hypernetwork, overlay network, composite network, multilevel network, multiweighted graph, heterogeneous network, multitype network, interconnected networks, interdependent networks, partially interdependent networks, network of networks, coupled networks, interconnecting networks, interacting networks, heterogenous information network
How do you teach a computer biology?
Visualizing Hetionet v1.0
- Hetnet of biology for drug repurposing
- ~50 thousand nodes
11 types (labels)
- ~2.25 million relationships
- integrates 29 public resources
knowledge from millions of studies
- standardized vocabularies
- stable, unambiguous identifiers
- Omics scale required
- Literature mining
- High throughput experimental technologies
- Avoid manual mapping
- Versioned data dependencies
Constructing Hetionet v1.0
What's the best software for storing and querying hetnets?
GitHub stats from 2018-02-21
- Customized Docker image
- Digital Ocean droplet
- SSL from Let's Encrypt
- readonly mode with a query execution timeout
- Custom GRASS style
- Custom guides
Public Hetionet Neo4j Instance
How could multiple sclerosis could affect retina layer formation?
More queries at thinklab.com/d/220
Project Rephetio: drug repurposing predictions
Hetionet v1.0 contains:
- 1,538 connected compounds
- 136 connected diseases
- 209,168 compound–disease pairs
- 755 treatments
- Systematic drug repurposing:
- Compare the therapeutic utility of data types
- Identify the mechanisms of drug efficacy
- Predict the probability of treatment for all 209,168 compound–disease pairs (het.io/repurpose)
Systematic integration of biomedical knowledge prioritizes drugs for repurposing
Daniel S Himmelstein, Antoine Lizee, Christine Hessler, Leo Brueggeman, Sabrina L Chen, Dexter Hadley, Ari Green, Pouya Khankhanian, Sergio E Baranzini
eLife (2017) https://doi.org/10.7554/eLife.26726
features = metapaths
positives = treatments
Machine learning methodology
Predictions succeed at prioritizing known treatments
- disease modifying treatments
AUROC = 97.4%
Predictions succeed at prioritizing experimental treatments
- disease modifying treatments
AUROC = 97.4%
- treatments in clinical trials
AUROC = 70.0%
1,206 compound–disease metapaths (length ≤ 4)
traditionally biomedicine, but newer in drug efficacy
genome-wide / high-throughput data sources
Browse at het.io/repurpose/metapaths.html
Project Rephetio: Does bupropion treat nicotine dependence?
- Bupropion was first approved for depression in 1985
In 1997, bupropion was approved for smoking cessation
- Can we predict this repurposing from Hetionet? The prediction was:
- 99.5th percentile for nicotine dependence
- probability 2.50-fold greater than null
Cypher query to find the top CbGbPWaD paths
Try at https://neo4j.het.io
Top 100 epilepsy predictions & their chemical structure
Discuss at thinklab.com/d/224#5
Top 100 epilepsy predictions & their drug targets
Discuss at thinklab.com/d/230#14
Nice of you to share this big network with everyone; however, I think you need to take care not to get yourself into legal trouble here. …
I am not trying to cause trouble here — just the contrary. When making a meta-resource, licenses and copyright law are not something you can afford to ignore. I regularly leave out certain data sources from my resources for legal reasons.
One network to rule them all
We have completed an initial version of our network. …
Network existence (SHA256 checksum for graph.json.gz) is proven in Bitcoin block 369,898.
Hetionet (≤ v1.0) integrated data from 31 resources:
- 5 United States Government works
- 12 openly licensed
- 4 non-commercial use only
- 9 were all rights reserved
- 1 explicitly & contractually forbid reuse
Requested permission for 11 resources:
- median time to first response was 16 days
- 2 affirmative responses
- who owns data
- incompatibilities: share alike vs non-commercial
- copyright status of data & fair use
- Solution: license attribute per node/relationship
Legal barriers to data reuse
the hetnet search engine
Future: all biomedical knowledge in a single network
- Teach computers how to read the literature and extract knowledge.
- Continuously and automatically refine and grow the hetnet.
- Free from any legal restrictions on reuse.
The hetnet awakens in Dedham for Artificial Intelligence: Transforming Pharma Congress
By Daniel Himmelstein