Head of Data Integration at Related Sciences. Digital craftsman of the biodata revolution.
Using a graph database to integrate biomedical knowledge and predict drug efficacy
Daniel Himmelstein (@dhimmel)
GDG Cloud DevFest Philly
Indy Hall · 399 Market St #360
September 28, 2019 1:00 PM
slides released under CC BY 4.0
How can we encode all biomedical knowledge into a single resource optimized for machine learning? We explore using hetnets (networks with multiple node and relationship types) and graph databases to integrate diverse information. By combining data from 29 public resources, we created Hetionet, a network with 11 node and 24 relationship types (available at https://neo4j.het.io). Next, we learned which types of paths occur more frequently when a drug treats a disease, allowing us to make over 200,000 predictions of treatment efficacy. Now we are creating a search engine at https://search.het.io/ to allow any researcher to quickly find how any two nodes in the hetnet are meaningfully connected. These studies were made possible by adopting a set of radically open practices, where all research was shared and discussed publicly from its inception. This includes our new Manubot software for open scholarly writing on GitHub.
Daniel Himmelstein is a postdoctoral fellow in the Greene Lab at the University of Pennsylvania. Previously, he received his PhD from the University of California San Francisco. His research focuses on integrating biomedical knowledge using networks. Daniel is also a frequent contributor to open source/data ecosystems, and explores how computational research can become more open and reproducible.
How I became intestested in graphs
My Facebook friendship network in 2014
networks with multiple node or relationship types
multilayer network, multiplex network, multivariate network, multinetwork, multirelational network, multirelational data, multilayered network, multidimensional network, multislice network, multiplex of interdependent networks, hypernetwork, overlay network, composite network, multilevel network, multiweighted graph, heterogeneous network, multitype network, interconnected networks, interdependent networks, partially interdependent networks, network of networks, coupled networks, interconnecting networks, interacting networks, heterogenous information network
How do you teach a computer biology?
Visualizing Hetionet v1.0
- Hetnet of biology for drug repurposing
- ~50 thousand nodes
11 types (labels)
- ~2.25 million relationships
- integrates 29 public resources
knowledge from millions of studies
How could multiple sclerosis could affect retina layer formation?
More queries at thinklab.com/d/220
- standardized vocabularies
- stable, unambiguous identifiers
- Omics scale required
- Literature mining
- High throughput experimental technologies
- Avoid manual mapping
- Versioned data dependencies
Constructing Hetionet v1.0
Project Rephetio: drug repurposing predictions
Hetionet v1.0 contains:
1,538 connected compounds
136 connected diseases
209,168 compound–disease pairs
Systematic drug repurposing:
Compare the therapeutic utility of data types
Identify the mechanisms of drug efficacy
Predict the probability of treatment for all 209,168 compound–disease pairs (het.io/repurpose)
Systematic integration of biomedical knowledge prioritizes drugs for repurposing
Daniel S Himmelstein, Antoine Lizee, Christine Hessler, Leo Brueggeman, Sabrina L Chen, Dexter Hadley, Ari Green, Pouya Khankhanian, Sergio E Baranzini
eLife (2017) https://doi.org/cdfk
Project Rephetio: Does bupropion treat nicotine dependence?
Top 100 epilepsy predictions & their chemical structure
Discuss at thinklab.com/d/224#5
Top 100 epilepsy predictions & their drug targets
Discuss at thinklab.com/d/230#14
how are two nodes connected?
findings → mechanims
we report that in human cancer cells, metformin inhibits mitochondrial complex I (NADH dehydrogenase) activity and cellular respiration.
— Metformin inhibits mitochondrial complex I of cancer cells to reduce tumorigenesis
Wheaton et al (2014) eLife https://doi.org/gfpb2x
Metformin is the most widely used antidiabetic drug in the world, and there is increasing evidence of a potential efficacy of this agent as an anticancer drug. First, epidemiological studies show a decrease in cancer incidence in metformin-treated patients.
— Metformin in Cancer Therapy: A New Perspective for an Old Antidiabetic Drug?
Sahra et al (2010) Mol Cancer Ther https://doi.org/bgr5vv
By De Jongens van de Tekeningen
Licensed under CC BY 3.0
Modified to invert colors
The Deep Review
- review article on deep learning in precision medicine
- 27 authors from 20 different institutions
- readers appreciate the breadth of perspectives
most viewed bioRxiv preprint of 2017
citation by persistent identifier
This is a sentence with 5 citations [ @doi:10.1038/nbt.3780; @pmid:29424689; @pmcid:PMC5938574; @arxiv:1407.3561; @url:https://greenelab.github.io/meta-review/ ].
Reproducibility of computational workflows is automated using continuous analysis
Brett K Beaulieu-Jones, Casey S Greene
Nature Biotechnology (2017-03-13) https://doi.org/f9ttx6
DOI: 10.1038/nbt.3780 · PMID: 28288103 · PMCID: PMC6103790
Sci-Hub provides access to nearly all scholarly literature.
Daniel S Himmelstein, Ariel Rodriguez Romero, Jacob G Levernier, Thomas Anthony Munro, Stephen Reid McLaughlin, Bastian Greshake Tzovaras, Casey S Greene
eLife (2018-03-01) https://www.ncbi.nlm.nih.gov/pubmed/29424689
DOI: 10.7554/elife.32822 · PMID: 29424689 · PMCID: PMC5832410
Opportunities and obstacles for deep learning in biology and medicine
Travers Ching, Daniel S. Himmelstein, Brett K. Beaulieu-Jones, Alexandr A. Kalinin, Brian T. Do, Gregory P. Way, Enrico Ferrero, Paul-Michael Agapow, Michael Zietz, Michael M. Hoffman, … Casey S. Greene
Journal of the Royal Society Interface (2018-04) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5938574/
DOI: 10.1098/rsif.2017.0387 · PMID: 29618526 · PMCID: PMC5938574
IPFS - Content Addressed, Versioned, P2P File System
arXiv (2014-07-14) https://arxiv.org/abs/1407.3561v1
Open collaborative writing with Manubot
Daniel S. Himmelstein, David R. Slochower, Venkat S. Malladi, Casey S. Greene, Anthony Gitter
This is a sentence with 5 citations [1,2,3,4,5].
Grant G-2018-11163 to DSH
convert rms-fsf-slide-propreitary.png -channel RGB -negate -transparent black rms-fsf-slide-propreitary-negated.png
FreeSoftware TEDx slides. (2014) Reused under CC BY 3.0
the software controls the science
FreeSoftware TEDx slides. (2014) Reused under CC BY 3.0
convert rms-fsf-slide.png -channel RGB -negate -transparent black rms-fsf-slide-negated.png
open source software:
the scientist controls the software
by default, scientific outputs subject to copyright
sometimes universities place additional legal barriers to reuse
Philly DevFest: Using a graph database to Integrate biomedical knowledge and predict drug efficacy
By Daniel Himmelstein