Daniel Himmelstein
Head of Data Integration at Related Sciences. Digital craftsman of the biodata revolution.
November 20, 2015
Asilomar Grounds
dhimmel on:
—Daniel Himmelstein
Sandler Neurosciences Center
Sergio
network of pathogenesis:
type is essential when operating on hetnets
metapath-based approach
feature extraction: the DWPC
learning to classify:
regularized logistic regression
mechanisms of pathogenesis:
comparing gene set collections
mechanisms of pathogenesis:
comparing metapaths
performance and permuation
Metagraph for predicting drug repurposing
Network for drug repurposing
limited by:
fair use
originality (excludes facts)
agreement entered into to receive access to a resource
automatically granted to "original works of authorship" giving the exclusive right to:
copy
distribute
create derivatives
Resolution after months & 5000+ word discussion: mixed approach
massively collaborative open science
22 reviewers, 53 discussions, 293 comments
stats from 2015-10-20
Discuss on
LINCS L1000: ~20,000 small molecules
We combine all signatures for each DrugBank compound to get a consensus signature
Brueggeman
transcriptional signatures discriminate
diuretics (DR) & anti-Inflammatories (AA)
← genes →
aggregated 4 databases:
yielding 1,388 indications
drug | disease | ajg | csh | eq |
---|---|---|---|---|
Acetylsalicylic acid | systemic lupus erythematosus | DM | DM | 1 |
Alprazolam | systemic scleroderma | SYM | SYM | 1 |
Baclofen | multiple sclerosis | SYM | SYM | 1 |
Bupropion | panic disorder | SYM | DM | 0 |
Captopril | rheumatoid arthritis | NOT | NOT | 1 |
Cisplatin | hematologic cancer | DM | DM | 1 |
Cladribine | hematologic cancer | DM | DM | 1 |
Clopidogrel | coronary artery disease | DM | DM | 1 |
Cocaine | dental caries | NOT | SYM | 0 |
class | ajh | csh |
---|---|---|
DM | 26 | 32 |
SYM | 20 | 17 |
NOT | 4 | 1 |
curation pilot results
66% ✓
metapath | nonzero | auroc |
---|---|---|
CcSEcCiD | 0.590 | 0.897 |
CiDiCiD | 0.255 | 0.840 |
CiDaGaD | 0.411 | 0.820 |
CtGtCiD | 0.227 | 0.807 |
CiDpSpD | 0.410 | 0.801 |
CiDlAlD | 0.400 | 0.797 |
CtGiGaD | 0.335 | 0.710 |
CuG<kuGaD | 0.425 | 0.692 |
CtGvD | 0.009 | 0.514 |
Results for all metapaths ≤ length 3
feature performance
(including symptomatic indications)
common side effects
shared indications
shared gene associations
shared targets
shared symptoms
knowledge-bias free
genetics
graph database
designed for property graphs
natively supports hetnets
cypher query language
MATCH path = (source:Disease)--(:Symptom)--(target:Disease)
WHERE
source.name = 'multiple sclerosis' AND
target.name = 'psoriasis'
RETURN path
Common symptoms between MS and psoriasis
MATCH path =
(source:Compound)-[:TARGET|BINDING]-
(:Gene)-[:VARIATION]-(target:Disease)
WHERE target.name = 'multiple sclerosis'
RETURN path
Compounds that target MS-associated genes
DWPC in Cypher
By Daniel Himmelstein
My talk for the 2015 UCSF QBC retreat at the Asilomar Conference Grounds. All original content is CC0.
Head of Data Integration at Related Sciences. Digital craftsman of the biodata revolution.