Daniel Himmelstein
Head of Data Integration at Related Sciences. Digital craftsman of the biodata revolution.
September 24, 2015
Smilow 10-120, UPenn
dhimmel on:
—Daniel Himmelstein
Moore Lab Lunch
Jesse's Tavern
August, 2007
Sandler Neurosciences Center
Sergio
Hodgkin's lymphoma is genetically closer to autoimmune diseases than solid cancers
network of pathogenesis:
type is essential when operating on hetnets
metapath-based approach
feature extraction: the DWPC
learning to classify:
regularized logistic regression
performance and permuation
mechanisms of pathogenesis:
comparing gene set collections
mechanisms of pathogenesis:
comparing metapaths
gene extraction from the GWAS Catalog
massively collaborative open science
22 reviewers, 47 discussions, 266 comments
Discuss on
LINCS L1000: ~20,000 small molecules
We combine all signatures for each DrugBank compound to get a consensus signature
Brueggeman
transcriptional signatures discriminate
diuretics (DR) & anti-Inflammatories (AA)
← genes →
aggregated 4 databases:
yielding 1,388 indications
drug | disease | ajg | csh | eq |
---|---|---|---|---|
Acetylsalicylic acid | systemic lupus erythematosus | DM | DM | 1 |
Alprazolam | systemic scleroderma | SYM | SYM | 1 |
Baclofen | multiple sclerosis | SYM | SYM | 1 |
Bupropion | panic disorder | SYM | DM | 0 |
Captopril | rheumatoid arthritis | NOT | NOT | 1 |
Cisplatin | hematologic cancer | DM | DM | 1 |
Cladribine | hematologic cancer | DM | DM | 1 |
Clopidogrel | coronary artery disease | DM | DM | 1 |
Cocaine | dental caries | NOT | SYM | 0 |
class | ajh | csh |
---|---|---|
DM | 26 | 32 |
SYM | 20 | 17 |
NOT | 4 | 1 |
curation pilot results
66% ✓
Network contains data from 28 resources:
Project is open notebook and maximally reusable
After a 5000+ word discussion:
metapath | nonzero | auroc |
---|---|---|
CcSEcCiD | 0.590 | 0.897 |
CiDiCiD | 0.255 | 0.840 |
CiDaGaD | 0.411 | 0.820 |
CtGtCiD | 0.227 | 0.807 |
CiDpSpD | 0.410 | 0.801 |
CiDlAlD | 0.400 | 0.797 |
CtGiGaD | 0.335 | 0.710 |
CuG<kuGaD | 0.425 | 0.692 |
CtGvD | 0.009 | 0.514 |
Results for all metapaths ≤ length 3
feature performance
(including symptomatic indications)
model
lasso auroc = 0.967
collinearity
subset of all 261 features
>>> import phd
By Daniel Himmelstein
Delivered on 2015-09-24 to the Greene Laboratory, stationed at the University of Pennsylvania. All original content is released under a CC0 license. Follow hyperlinks for attribution of reused content.
Head of Data Integration at Related Sciences. Digital craftsman of the biodata revolution.