Head of Data Integration at Related Sciences. Digital craftsman of the biodata revolution.
Slides online at: slides.com
Friday December 9, 2016
10:05–10:15 am (OP 18)
Precision oncology requires that we functionally categorize cancers into treatment-relevant subtypes. The predominant approach—characterizing tumors based solely on actionable mutations—struggles to detect complex changes in gene or pathway function. Alternatively, genome-wide expression profiles provide a comprehensive reflection of aberrant cellular states resulting from mutation events. Therefore, we embarked on Project Cognoma to translate between gene expression and mutation in cancer.
Cognoma is an open-source/citizen-science philanthropy being developed as a collaboration between the Greene Lab at Penn and the DataPhilly and Code for Philly meetups. This arrangement leverages the collective fullstack expertise of our diverse contributor base. Hitherto, hundreds of individuals have attended Cognoma meetups, and more than fifty have gotten involved on GitHub. Our priorities are everyone learns something new and putting machine learning in the hands of cancer biologists.
Our product is cognoma.org, a webapp that makes it easy to build mutation status classifiers from gene expression on 7,306 TCGA samples representing 33 cancer types. The publicly available dataset contains RNA-seq gene expression for 20,530 genes, non-silent mutation calls for 21,940 genes, and sample attributes such as the patient's disease, age, sex, and survival. Cognoma enables a cancer biologist to assign each sample a mutation status based on one or more selected genes. Next, a disciplined classifier is trained using gene expression and sample attributes as features. As output, the user receives the importance of each feature—offering insight into the molecular effects of their chosen mutation—as well as a mutation scores for samples—which potentially identify hidden responders to targeted pharmacotherapies.
Predicting cancer mutations from gene expression
Using public domain TCGA data
(from Xena Browser)
elastic net logistic regression
- tumor protein p53
- tumor suppressor protein
- 35% of samples mutated
Could gene expression identify hidden responders?
- 70+ targeted cancer therapies approved by the FDA
Pull requests model for contribution
Contribution per user by GitHub repository
Full Stack Data Science
- Cancer data
- Machine learning
Organization Members: Andrew Madonna, Benjamin Dolly, Gregory Way, Casey Greene, Karin Wolok, Jesse Prestwood-Taylor, Claire McLeod, Stephen Shank, Josh Levy, Yichuan Liu, Derek Goss, Tanner Wells & many more at https://git.io/v65k6.
The Cognoma Collaborative creates a webapp to predict cancer mutations from gene expression
By Daniel Himmelstein