Integrating data towards a systematic understanding of drug efficacy

February 21, 2017

Pfizer, Cambridge, MA

By Daniel Himmelstein

@dhimmel

Slides at slides.com/dhimmel/pfizer

How do you teach a computer biology?

Visualizing Hetionet v1.0

  • Hetnet of biology designed for drug repurposing
     
  • ~50 thousand nodes
    11 types (labels)
     
  • ~2.25 million relationships
    24 types
     
  • integrates 29 public resources
    knowledge from millions of studies
     
  • online at https://neo4j.het.io

Hetionet v1.0

Project Rephetio: drug repurposing predictions

  • Hetionet v1.0 contains:
    • 1,538 connected compounds
    • 136 connected diseases
    • 209,168 compound–disease pairs
    • 755 treatments
  • 1,206 compound–disease types of paths (with length ≤ 4)
  • machine learning classifier
  • predict the probability of treatment for all 209,168 compound–disease pairs (het.io/repurpose)
  • Project online at thinklab.com/p/rephetio

Systematic integration of biomedical knowledge prioritizes drugs for repurposing
Daniel S Himmelstein, Antoine Lizee, Christine Hessler, Leo Brueggeman, Sabrina L Chen, Dexter Hadley, Ari Green, Pouya Khankhanian, Sergio E Baranzini
bioRxiv. 2016. DOI: 10.1101/087619

features = metapaths

observations =

compound–disease pairs

positives = treatments

negatives =

non-treatments

Machine learning methodology

slide added after presentation

1,206 compound–disease metapaths (length ≤ 4)

DWPC Δ AUROC

  1. Upper tier:
    traditional pharmacology
  2. Upper-middle tier:
    traditionally biomedicine, but newer in drug efficacy
  3. Lower-middle tier:
    genome-wide / high-throughput data sources
  4. Lower tier:
    cellular components

slide added after presentation

Predictions succeed at prioritizing known treatments

Project Rephetio: Does bupropion treat nicotine dependence?

  • Bupropion was first approved for depression in 1985
     
  • In 1997, bupropion was approved for smoking cessation
     
  • Can we predict this repurposing from Hetionet? The prediction was:

Compound–causes–SideEffect–causes–Compound–treats–Disease

Compound–binds–Gene–binds–Compound–treats–Disease

Compound–binds–Gene–associates–Disease

Compound–binds–Gene–participates–Pathway–participates–Disease

MATCH path = (n0:Compound)-[:BINDS_CbG]-(n1)-[:PARTICIPATES_GpPW]-
  (n2)-[:PARTICIPATES_GpPW]-(n3)-[:ASSOCIATES_DaG]-(n4:Disease)
USING JOIN ON n2
WHERE n0.name = 'Bupropion'
  AND n4.name = 'nicotine dependence'
  AND n1 <> n3
WITH
[
  size((n0)-[:BINDS_CbG]-()),
  size(()-[:BINDS_CbG]-(n1)),
  size((n1)-[:PARTICIPATES_GpPW]-()),
  size(()-[:PARTICIPATES_GpPW]-(n2)),
  size((n2)-[:PARTICIPATES_GpPW]-()),
  size(()-[:PARTICIPATES_GpPW]-(n3)),
  size((n3)-[:ASSOCIATES_DaG]-()),
  size(()-[:ASSOCIATES_DaG]-(n4))
] AS degrees, path
RETURN
  path,
  reduce(pdp = 1.0, d in degrees| pdp * d ^ -0.4) AS path_weight
ORDER BY path_weight DESC
LIMIT 10

Cypher query to find the top CbGbPWaD paths

Epilepsy predictions

(browse all predictions at het.io/repurpose)

Discuss at thinklab.com/d/224

Evaluating the top 100 epilepsy predictions

Top 100 epilepsy predictions & their chemical structure

Top 100 epilepsy predictions & their drug targets

Tissue-support requires both genes in the path to be expressed in the cardiovascular system.

Find the tissue-supported contribution of each pathway to treating CAD with enalapril (https://neo4j.het.io)

MATCH path = (n0:Compound)-[:BINDS_CbG]-(n1)-[:PARTICIPATES_GpPW]-
  (n2)-[:PARTICIPATES_GpPW]-(n3)-[:ASSOCIATES_DaG]-(n4:Disease)
MATCH (n4)-[:LOCALIZES_DlA]-(anatomy)
MATCH (n1)-[:EXPRESSES_AeG]-(anatomy)-[:EXPRESSES_AeG]-(n3)
WHERE n0.name = 'Enalapril'
  AND n4.name = 'coronary artery disease'
  AND n1 <> n3
WITH
  DISTINCT path,
  n2 AS pathway,
[
  size((n0)-[:BINDS_CbG]-()),
  size(()-[:BINDS_CbG]-(n1)),
  size((n1)-[:PARTICIPATES_GpPW]-()),
  size(()-[:PARTICIPATES_GpPW]-(n2)),
  size((n2)-[:PARTICIPATES_GpPW]-()),
  size(()-[:PARTICIPATES_GpPW]-(n3)),
  size((n3)-[:ASSOCIATES_DaG]-()),
  size(()-[:ASSOCIATES_DaG]-(n4))
] AS degrees
RETURN
  pathway.identifier AS pathway_id,
  pathway.name AS pathway_name,
  count(*) AS PC,
  sum(reduce(pdp = 1.0, d in degrees| pdp * d ^ -0.4)) AS DWPC
ORDER BY DWPC DESC, pathway_name
MATCH path = (n0:SideEffect)-[r1:CAUSES_CcSE]
  -(n1:Compound)-[r2:BINDS_CbG]-(n2:Gene)
WHERE n0.name = 'Cushingoid'
WITH
[
  size((n0)-[:CAUSES_CcSE]-()),
  size(()-[:CAUSES_CcSE]-(n1)),
  size((n1)-[:BINDS_CbG]-()),
  size(()-[:BINDS_CbG]-(n2))
] AS degrees, path, n2
WITH
  n2,
  count(path) AS PC,
  sum(reduce(pdp = 1.0, d in degrees| pdp * d ^ -0.4)) AS DWPC
RETURN
  n2.identifier AS gene_id,
  n2.name AS gene_symbol,
  n2.description AS gene_name,
  PC, DWPC
ORDER BY DWPC DESC, gene_symbol

What drug targets are responsible for the side effect of Cushingoid? https://neo4j.het.io

Query from https://thinklab.com/d/220#6

What drugs targer NR3C1 and also cause Cushingoid? https://neo4j.het.io

MATCH path = (n0:SideEffect)-[r1:CAUSES_CcSE]-(n1:Compound)-[r2:BINDS_CbG]-(n2:Gene)
WHERE n0.name = 'Cushingoid'
  AND n2.name = 'NR3C1'
RETURN path

Questions

Slides at slides.com/dhimmel/pfizer

Integrating data towards a systematic understanding of drug efficacy

By Daniel Himmelstein

Integrating data towards a systematic understanding of drug efficacy

Presentation to Pfizer in Cambridge, MA. These slides are released under a CC BY 4.0 License.

  • 3,063