Hetnet connectivity search provides rapid insights into how two biomedical entities are related

Daniel Himmelstein (@dhimmel)

Rocky Mountain Bioinformatics Conference

Viceroy Snowmass, Colorado

December 6, 2019 at 5:30 PM

slides.com/dhimmel/rocky2019

slides released under CC BY 4.0

Greene Lab

http://www.greenelab.com/

Study Contributors:

  • Michael Zietz
  • Vince Rubinetti
  • Benjamin Heil
  • Kyle Kloster
  • Michael Nagle
  • Blair Sullivan
  • Casey Greene

Abstract:

Hetnets, short for “heterogeneous networks”, contain multiple node and relationship types and offer a way to encode biomedical knowledge. For example, Hetionet connects 11 types of nodes — including genes, diseases, drugs, pathways, and anatomical structures — with over 2 million edges of 24 types. Previously, we trained a classifier to repurpose drugs using features extracted from Hetionet. The model identified types of paths between a drug and disease that occurred more frequently between known treatments.

For many applications however, a training set of known relationships does not exist; Yet researchers would still like to know how two nodes are meaningfully connected. For example, users may want to know not only how metformin is related to breast cancer, but also how the GJA1 gene might be involved in insomnia. Therefore, we developed hetnet connectivity search to propose the most important paths between any two nodes.

The algorithm behind connectivity search identifies types of paths that occur more frequently than would be expected by chance (based on node degree alone). We implemented the method on Hetionet and provide an online interface at https://het.io/search. Several optimizations were required to precompute significant instances of node connectivity at scale. We provide an open source implementation of these methods in our new Python package named hetmatpy.

To validate the method, we show that it identifies much of the same evidence for specific instances of drug repurposing as the previous supervised approach, but without requiring a training set.

OP28 Details

Authors:

  • Daniel Himmelstein, University of Pennsylvania
  • Michael Zietz, Columbia University
  • Vincent Rubinetti, University of Pennsylvania
  • Benjamin Heil, University of Pennsylvania
  • Kyle Kloster, North Carolina State University
  • Michael Nagle, Pfizer
  • Blair Sullivan, University of Utah
  • Casey Greene, University of Pennsylvania

how can we teach biomedicine to a machine?

The hetnet awakens: understanding complex diseases through data integration and open science

  • Hetnet of biology for drug repurposing
     
  • ~50 thousand nodes
    11 types (labels)
     
  • ~2.25 million relationships
    24 types
     
  • integrates 29 public resources
    knowledge from millions of studies

Hetionet v1.0

Systematic integration of biomedical knowledge prioritizes drugs for repurposing
Daniel S Himmelstein, Antoine Lizee, Christine Hessler, Leo Brueggeman, Sabrina L Chen, Dexter Hadley, Ari Green, Pouya Khankhanian, Sergio E Baranzini
eLife (2017) https://doi.org/cdfk

Hetionet metagraph (schema)

https://neo4j.het.io

connectivity search

how are two nodes connected?

sans supervision

// Cypher graph query language 
MATCH path = (source:Disease)-[*..3]-(target:Pathway)
WHERE
  source.name = "Alzheimer's disease" AND
  target.name = "Circadian rythm related genes"
RETURN path
LIMIT 100

execute me at neo4j.het.io

a new type of search engine

https://het.io/search/

DWPC — Measures the extent of connectivity between the source and target node for the given metapath. Like the path count, but with less weight given to paths along high-degree nodes.

degree-weighted path count

a null distribution computed from 200 permuted hetnets

the hurdle

the gamma

https://het.io/software/

Thanks!

@dhimmel

0000-0002-3012-7446

Slides
https://slides.com/dhimmel/rocky2019

Extra Slides

todo: validation slide

https://het.io/search/?source=17054&target=6602

findings → mechanims

we report that in human cancer cells, metformin inhibits mitochondrial complex I (NADH dehydrogenase) activity and cellular respiration.

— Metformin inhibits mitochondrial complex I of cancer cells to reduce tumorigenesis
Wheaton et al (2014) eLife https://doi.org/gfpb2x

Metformin is the most widely used antidiabetic drug in the world, and there is increasing evidence of a potential efficacy of this agent as an anticancer drug. First, epidemiological studies show a decrease in cancer incidence in metformin-treated patients.

— Metformin in Cancer Therapy: A New Perspective for an Old Antidiabetic Drug?

Sahra et al (2010) Mol Cancer Ther https://doi.org/bgr5vv

gamma-hurdle null distribution for DWPCs

connectivity search extras

How might multiple sclerosis could affect retina layer formation?

MATCH path =
  // Specify the type of path to match
  (n0:Disease)-[e1:ASSOCIATES_DaG]-(n1:Gene)-[:INTERACTS_GiG]-
  (n2:Gene)-[:PARTICIPATES_GpBP]-(n3:BiologicalProcess)
WHERE
  // Specify the source and target nodes
  n0.name = 'multiple sclerosis' AND
  n3.name = 'retina layer formation'
  // Require GWAS support for the
  // Disease-associates-Gene relationship
  AND 'GWAS Catalog' in e1.sources
  // Require the interacting gene to be
  // upregulated in a relevant tissue
  AND exists(
    (n0)-[:LOCALIZES_DlA]-(:Anatomy)-[:UPREGULATES_AuG]-(n2))
RETURN path

execute me at neo4j.het.io

MATCH path = (source:Disease)-[*..2]-(target:BiologicalProcess)
WHERE
  source.name = 'multiple sclerosis' AND
  target.name = 'retina layer formation'
RETURN path

execute me at neo4j.het.io

MATCH path = (source:Disease)-[*..3]-(target:BiologicalProcess)
WHERE
  source.name = 'multiple sclerosis' AND
  target.name = 'retina layer formation'
RETURN path

execute me at neo4j.het.io

Query profile

  • 106 seconds
  • 135 million database hits

Rephetio

observations =

compound–disease pairs

features = types of paths

treatments

Project Rephetio

Systematic integration of biomedical knowledge prioritizes drugs for repurposing
Daniel S Himmelstein, Antoine Lizee, Christine Hessler, Leo Brueggeman, Sabrina L Chen, Dexter Hadley, Ari Green, Pouya Khankhanian, Sergio E Baranzini
eLife (2017) https://doi.org/cdfk

predicted probability of treatment for 209,168 compound–disease pairs
https://het.io/repurpose/

1,538 connected

138 connected

disease modifying treatments
+755, −208,413
AUROC = 97.4%

treatments with clinical trials
+5,594, −202,186
AUROC = 70.0%

Project Rephetio: drug repurposing predictions

  • Hetionet v1.0 contains:

    • 1,538 connected compounds

    • 136 connected diseases

    • 209,168 compound–disease pairs

    • 755 treatments

  • Systematic drug repurposing:

    • Compare the therapeutic utility of data types

    • Identify the mechanisms of drug efficacy

    • Predict the probability of treatment for all 209,168 compound–disease pairs (het.io/repurpose)

Systematic integration of biomedical knowledge prioritizes drugs for repurposing
Daniel S Himmelstein, Antoine Lizee, Christine Hessler, Leo Brueggeman, Sabrina L Chen, Dexter Hadley, Ari Green, Pouya Khankhanian, Sergio E Baranzini
eLife (2017) https://doi.org/cdfk

online discussion contributions
(see thinklab.com/p/rephetio/leaderboard)

Project Rephetio: Does bupropion treat nicotine dependence?

  • Bupropion was first approved for depression in 1985
     
  • In 1997, bupropion was approved for smoking cessation
     
  • Can we predict this repurposing from Hetionet? The prediction was:

Compound–causes–Side Effect–causes–Compound–treats–Disease

Compound–binds–Gene–associates–Disease

Compound–binds–Gene–participates–Pathway–participates–Disease

Rocky 2019: Hetnet connectivity search provides rapid insights into how two biomedical entities are related

By Daniel Himmelstein

Rocky 2019: Hetnet connectivity search provides rapid insights into how two biomedical entities are related

Presentation by Daniel Himmelstein at Rocky Bioinformatics Conference on 2019-11-13. This presentation is released under a CC BY 4.0 License.

  • 2,872