Daniel Himmelstein
Head of Data Integration at Related Sciences. Digital craftsman of the biodata revolution.
September 24, 2015
Smilow 10-120, UPenn
dhimmel on:
—Daniel Himmelstein
Moore Lab Lunch
Jesse's Tavern
August, 2007
Sandler Neurosciences Center
Sergio
Hodgkin's lymphoma is genetically closer to autoimmune diseases than solid cancers
network of pathogenesis:
type is essential when operating on hetnets
metapath-based approach
feature extraction: the DWPC
learning to classify:
regularized logistic regression
performance and permuation
mechanisms of pathogenesis:
comparing gene set collections
mechanisms of pathogenesis:
comparing metapaths
gene extraction from the GWAS Catalog
massively collaborative open science
22 reviewers, 47 discussions, 266 comments
Discuss on
LINCS L1000: ~20,000 small molecules
We combine all signatures for each DrugBank compound to get a consensus signature
Brueggeman
transcriptional signatures discriminate
diuretics (DR) & anti-Inflammatories (AA)
← genes →
aggregated 4 databases:
yielding 1,388 indications
| drug | disease | ajg | csh | eq |
|---|---|---|---|---|
| Acetylsalicylic acid | systemic lupus erythematosus | DM | DM | 1 |
| Alprazolam | systemic scleroderma | SYM | SYM | 1 |
| Baclofen | multiple sclerosis | SYM | SYM | 1 |
| Bupropion | panic disorder | SYM | DM | 0 |
| Captopril | rheumatoid arthritis | NOT | NOT | 1 |
| Cisplatin | hematologic cancer | DM | DM | 1 |
| Cladribine | hematologic cancer | DM | DM | 1 |
| Clopidogrel | coronary artery disease | DM | DM | 1 |
| Cocaine | dental caries | NOT | SYM | 0 |
| class | ajh | csh |
|---|---|---|
| DM | 26 | 32 |
| SYM | 20 | 17 |
| NOT | 4 | 1 |
curation pilot results
66% ✓
Network contains data from 28 resources:
Project is open notebook and maximally reusable
After a 5000+ word discussion:
| metapath | nonzero | auroc |
|---|---|---|
| CcSEcCiD | 0.590 | 0.897 |
| CiDiCiD | 0.255 | 0.840 |
| CiDaGaD | 0.411 | 0.820 |
| CtGtCiD | 0.227 | 0.807 |
| CiDpSpD | 0.410 | 0.801 |
| CiDlAlD | 0.400 | 0.797 |
| CtGiGaD | 0.335 | 0.710 |
| CuG<kuGaD | 0.425 | 0.692 |
| CtGvD | 0.009 | 0.514 |
Results for all metapaths ≤ length 3
feature performance
(including symptomatic indications)
model
lasso auroc = 0.967
collinearity
subset of all 261 features
>>> import phd
By Daniel Himmelstein
Delivered on 2015-09-24 to the Greene Laboratory, stationed at the University of Pennsylvania. All original content is released under a CC0 license. Follow hyperlinks for attribution of reused content.
Head of Data Integration at Related Sciences. Digital craftsman of the biodata revolution.