Heterogeneous networks to integrate biomedical knowledge and predicting new uses for existing drugs

Oxford University

February 27, 2019

Slides at slides.com/dhimmel/oxford

The hetnet awakens: understanding complex diseases through data integration and open science

Greene Lab

I'm a data scientist

http://www.greenelab.com/

How I became intestested in graphs

http://blog.dhimmel.com/friendship-network/

My Facebook friendship network in 2014

too simple

single node type

single relationship type

networks with multiple node or relationship types

multilayer network, multiplex network, multivariate network, multinetwork, multirelational network, multirelational data, multilayered network, multidimensional network, multislice network, multiplex of interdependent networks, hypernetwork, overlay network, composite network, multilevel network, multiweighted graph, heterogeneous network, multitype network, interconnected networks, interdependent networks, partially interdependent networks, network of networks, coupled networks, interconnecting networks, interacting networks, heterogenous information network

A 2012 Study identified 26 different names for this type of network:

hetnet

How do you teach a computer biology?

Visualizing Hetionet v1.0

  • Hetnet of biology for drug repurposing
     
  • ~50 thousand nodes
    11 types (labels)
     
  • ~2.25 million relationships
    24 types
     
  • integrates 29 public resources
    knowledge from millions of studies

Hetionet v1.0

  • Nodes
    • standardized vocabularies
    • stable, unambiguous identifiers
       
  • Relationships:
    • Omics scale required
    • Literature mining
    • High throughput experimental technologies
    • Avoid manual mapping
       
  • Versioned data dependencies

Constructing Hetionet v1.0

What's the best software for storing and querying hetnets?

dhimmel/hetio
136
18
6
neo4j/neo4j
53,793
4,727
1,283

GitHub stats from 2018-02-21

  • Customized Docker image
  • Digital Ocean droplet
  • SSL from Let's Encrypt
  • readonly mode with a query execution timeout
  • Custom GRASS style
  • Custom guides

Public Hetionet Neo4j Instance

Details at doi.org/brsc 

MATCH path =
  // Specify the type of path to match
  (n0:Disease)-[e1:ASSOCIATES_DaG]-(n1:Gene)-[:INTERACTS_GiG]-
  (n2:Gene)-[:PARTICIPATES_GpBP]-(n3:BiologicalProcess)
WHERE
  // Specify the source and target nodes
  n0.name = 'multiple sclerosis' AND
  n3.name = 'retina layer formation'
  // Require GWAS support for the
  // Disease-associates-Gene relationship
  AND 'GWAS Catalog' in e1.sources
  // Require the interacting gene to be
  // upregulated in a relevant tissue
  AND exists(
    (n0)-[:LOCALIZES_DlA]-(:Anatomy)-[:UPREGULATES_AuG]-(n2))
RETURN path

How could multiple sclerosis could affect retina layer formation?

More queries at thinklab.com/d/220

Project Rephetio contributions on Thinklab

(see thinklab.com/p/rephetio/leaderboard)

Project Rephetio: drug repurposing predictions

  • Hetionet v1.0 contains:
    • 1,538 connected compounds
    • 136 connected diseases
    • 209,168 compound–disease pairs
    • 755 treatments
  • Systematic drug repurposing:
    • Compare the therapeutic utility of data types
    • Identify the mechanisms of drug efficacy
    • Predict the probability of treatment for all 209,168 compound–disease pairs (het.io/repurpose)

Systematic integration of biomedical knowledge prioritizes drugs for repurposing
Daniel S Himmelstein, Antoine Lizee, Christine Hessler, Leo Brueggeman, Sabrina L Chen, Dexter Hadley, Ari Green, Pouya Khankhanian, Sergio E Baranzini
eLife (2017) https://doi.org/10.7554/eLife.26726

features = metapaths

observations =

compound–disease pairs

positives = treatments

negatives =

non-treatments

Machine learning methodology

Predictions succeed at prioritizing known treatments

  • disease modifying treatments
    +755, −208,413
    AUROC = 97.4%

Predictions succeed at prioritizing experimental treatments

  • disease modifying treatments
    +755, −208,413
    AUROC = 97.4%
  • treatments in clinical trials
    +5,594, −202,186
    AUROC = 70.0%

1,206 compound–disease metapaths (length ≤ 4)

  1. Upper tier:
    traditional pharmacology
  2. Upper-middle tier:
    traditionally biomedicine, but newer in drug efficacy
  3. Lower-middle tier:
    genome-wide / high-throughput data sources
  4. Lower tier:
    cellular components

Browse at het.io/repurpose/metapaths.html

Project Rephetio: Does bupropion treat nicotine dependence?

  • Bupropion was first approved for depression in 1985
     
  • In 1997, bupropion was approved for smoking cessation
     
  • Can we predict this repurposing from Hetionet? The prediction was:

Compound–causes–SideEffect–causes–Compound–treats–Disease

Compound–binds–Gene–binds–Compound–treats–Disease

Compound–binds–Gene–associates–Disease

Compound–binds–Gene–participates–Pathway–participates–Disease

MATCH path = (n0:Compound)-[:BINDS_CbG]-(n1)-[:PARTICIPATES_GpPW]-
  (n2)-[:PARTICIPATES_GpPW]-(n3)-[:ASSOCIATES_DaG]-(n4:Disease)
USING JOIN ON n2
WHERE n0.name = 'Bupropion'
  AND n4.name = 'nicotine dependence'
  AND n1 <> n3
WITH
[
  size((n0)-[:BINDS_CbG]-()),
  size(()-[:BINDS_CbG]-(n1)),
  size((n1)-[:PARTICIPATES_GpPW]-()),
  size(()-[:PARTICIPATES_GpPW]-(n2)),
  size((n2)-[:PARTICIPATES_GpPW]-()),
  size(()-[:PARTICIPATES_GpPW]-(n3)),
  size((n3)-[:ASSOCIATES_DaG]-()),
  size(()-[:ASSOCIATES_DaG]-(n4))
] AS degrees, path
RETURN
  path,
  reduce(pdp = 1.0, d in degrees| pdp * d ^ -0.4) AS path_weight
ORDER BY path_weight DESC
LIMIT 10

Cypher query to find the top CbGbPWaD paths

Try at https://neo4j.het.io

Browse all predictions at het.io/repurpose. Discuss at thinklab.com/d/224

Top 100 epilepsy predictions & their chemical structure

Top 100 epilepsy predictions & their drug targets

hetmech

Kyle Kloster

@kkloste

Michael Zietz

@zietzm

https://github.com/greenelab/hetmech

the hetnet search engine for node connectivity

previously supported by

Ben Heil

@ben-heil

https://zietzm.github.io/Vagelos2017/

days to seconds

metapath_id path_count dwpc p_value source_degree target_degree n_dwpcs n_nonzero_dwpcs nonzero_mean nonzero_sd
DaGpBPpG 435 2.8 0.0000% 373 32 29,000 29,000 2.1 0.12
DaGeAeG 6,204 2.0 0.0000% 373 28 53,000 53,000 1.9 0.02
DpSpDaG 25 4.4 0.0134% 17 6 101,000 100,994 2.4 0.45
DrDaG 3 5.1 0.2442% 5 6 181,800 32,414 3.9 0.51
DlAlDaG 42 3.7 0.7010% 33 6 20,200 20,200 2.7 0.38
DpSpDdG 5 3.4 2.7443% 17 2 1,065,000 1,009,309 1.9 0.67
DrDuGiG 1 1.1 3.2758% 5 2 2,885,400 124,913 2.0 1.16
DdGuDdG 4 3.8 4.3339% 45 2 213,000 142,687 2.9 0.56
DdGeAeG 739 1.6 6.1179% 45 28 53,000 53,000 1.5 0.05
DdGcGr>G 1 2.9 6.8995% 45 7 69,600 6,278 3.7 1.00
DaG 1 5.3 8.8886% 373 6 20,200 3,591 5.3 0.00
DlAdGcG 11 2.0 9.7265% 33 6 115,400 105,130 1.1 0.63

Most enriched types of paths connecting FTO and obesity

metapath_id	path_count	dwpc	p_value	source_degree	target_degree	n_dwpcs	n_nonzero_dwpcs	nonzero_mean	nonzero_sd
DaGpBPpG	435	2.8	0.0000%	373	32	29,000	29,000	2.1	0.12
DaGeAeG	6,204	2.0	0.0000%	373	28	53,000	53,000	1.9	0.02
DpSpDaG	25	4.4	0.0134%	17	6	101,000	100,994	2.4	0.45
DrDaG	3	5.1	0.2442%	5	6	181,800	32,414	3.9	0.51
DlAlDaG	42	3.7	0.7010%	33	6	20,200	20,200	2.7	0.38
DpSpDdG	5	3.4	2.7443%	17	2	1,065,000	1,009,309	1.9	0.67
DrDuGiG	1	1.1	3.2758%	5	2	2,885,400	124,913	2.0	1.16
DdGuDdG	4	3.8	4.3339%	45	2	213,000	142,687	2.9	0.56
DdGeAeG	739	1.6	6.1179%	45	28	53,000	53,000	1.5	0.05
DdGcGr>G	1	2.9	6.8995%	45	7	69,600	6,278	3.7	1.00
DaG	1	5.3	8.8886%	373	6	20,200	3,591	5.3	0.00
DlAdGcG	11	2.0	9.7265%	33	6	115,400	105,130	1.1	0.63

Future: all biomedical knowledge in a single network

https://github.com/greenelab/snorkeling

  • Teach computers how to read the literature and extract knowledge.
     
  • Continuously and automatically refine and grow the hetnet.
     
  • Free from any legal restrictions on reuse. 

David Robinson

@danich1

Questions?

@dhimmel

0000-0002-3012-7446

Made with Slides.com