Open sourceror. Digital craftsman of the biodata revolution.
2:00–3:00 PM, April 26, 2016
Genentech Hall Auditorium
University of California, San Francisco
Sandler Neurosciences Center
What can I tell about my Facebook Friends, just knowing their mutual friendships?
Disease network from GWAS
Khankhanian et al. (2016) Int J Epidemiol. DOI: bfmj
Hetio Disease Genes
Himmelstein & Baranzini (2015) PLOS Comp Bio. DOI: 98q
network of pathogenesis:
- integrates diverse data
- 40,343 nodes of 18 types
- 1,608,168 edges of 19 types
type is essential when operating on hetnets
feature extraction: the DWPC
mechanisms of pathogenesis:
comparing gene set collections
Himmelstein, Lizee, Khankhanian, Brueggeman, Chen, Hessler, Green Hadley & Baranzini (2015–201?) Thinklab. DOI: 993
Metagraph in 2013
Metagraph in 2016
1,552 small molecule compounds
137 complex diseases
47,031 nodes of 11 types
- 2,250,197 edges of 24 types
- 29 resources
- millions of studies from last half century
- 1,206 types of paths
- 209,168 potential treatments
(1,538 compounds × 136 complex diseases)
- 6 hetnets (5 permuted)
- ~60 million database queries
- predictive model of efficacy
Machine Learning Paradigm
Greene & Himmelstein (2016) Circ Cardiovasc Genet. DOI: bffr
Lung Cancer Rates, 2007–2011
Simeonov & Himmelstein (2015) PeerJ. DOI: 98p
Trouble at the Town Hall
1908 in Silverton, CO
San Juan Historical Society
San Juan County, Colorado
Time from submission to acceptance for 3,330,333 articles since 1965
Visualization by Antoine Lizee
I am a lawyer, but not your lawyer (or UC’s lawyer), and this isn’t legal advice.
Others, like myself, try to remember to rate everything that I've read.
It took me a while to figure out.
I would like to enter into the discussion the cases where there was a tough decision to be made
1504 Word Post
2776 Word Post
Nice of you to share this big network with everyone; however, I think you need to take care not to get yourself into legal trouble here. …
I am not trying to cause trouble here — just the contrary. When making a meta-resource, licenses and copyright law are not something you can afford to ignore. I regularly leave out certain data sources from my resources for legal reasons.
One network to rule them all
We have completed an initial version of our network. …
Network existence (SHA256 checksum for graph.json.gz) is proven in Bitcoin block 369,898.
- Hetionet integrates data from 29 resources
- 12 had an open license
- 9 had no license
Incompatibilities - Share Alike vs Non-Commercial
- Requested permission for 11 resources
- Median time to first reponse was 16 days
2 affirmative responses
- Removed MSigDB
- "LICENSEE agrees not to put … the DATABASE on a … server … that may be accessed by any individual other than the LICENSEE."
- LICENSEE agrees to provide … a written evaluation of the PROGRAM and the DATABASE, including a description of its functionality or problems and areas for further improvement
Legal barriers to data reuse
release data under an open license
This material is based upon work supported by the National Science Foundation Graduate Research Fellowship Program under Grant No. 1144247. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation.
Creator of Thinklab
Presentation online at:
Reception & Exhibit to Follow:
- Sandler Neurosciences Center, 2nd Floor
Thesis Seminar · Daniel Himmelstein · Biological & Medical Informatics · UCSF
By Daniel Himmelstein