Daniel Himmelstein
Head of Data Integration at Related Sciences. Digital craftsman of the biodata revolution.
By Daniel Himmelstein
Lecture for EPID 600
Data Science for Biomedical Informatics
1:00 – 2:30 pm, BRB 251
October 20, 2016
Slides at slides.com/dhimmel/epid600
Dissapearing decades: Amazon titles by decade
McKiernan et al. (2016) eLife
From Meet the Robin Hood of Science by Simon Oxenham:
On the evening of November 9th, 1989, the Cold War came to a dramatic end with the fall of the Berlin Wall. Four years ago another wall began to crumble, a wall that arguably has as much impact on the world as the wall that divided East and West Germany. The wall in question is the network of paywalls that cuts off tens of thousands of students and researchers around the world, at institutions that can’t afford expensive journal subscriptions, from accessing scientific research.
1.
2.
3.
Definition: a draft of an article that has not yet been peer reviewed for formal publication
Benefits:
4.
Stanford's Biomedical Computation Review mentions our research and preprint 6 months before publication
⌛⌛⌛⌛⌛⌛⌛
⌛⌛⌛⌛⌛⌛⌛
⌛⌛⌛⌛⌛⌛⌛
⌛⌛⌛⌛⌛⌛⌛
March 26, 2015: my paper on Heterogeneous Network Edge Prediction is accepted to PLOS Computational Biology.
⌛⌛⌛⌛⌛⌛⌛
⌛⌛⌛⌛⌛⌛⌛
⌛⌛⌛⌛⌛⌛⌛
⌛⌛⌛⌛⌛⌛⌛
⌛⌛⌛⌛⌛⌛⌛ ⌛⌛⌛⌛⌛
68 days
Are publication delays getting shorter or longer? Kendall Powell, writing a feature for Nature News, contacted me. Her investigation had uncovered a widespread belief that delays were worsening with time. But she wanted data, and the existing data was field specific or anecdotal.
Time from submission to acceptance for 3,330,333 articles since 1965
Visualizing Hetionet v1.0
Visualization by Antoine Lizee
I am a lawyer, but not your lawyer (or UC’s lawyer), and this isn’t legal advice.
Others, like myself, try to remember to rate everything that I've read.
It took me a while to figure out.
I would like to enter into the discussion the cases where there was a tough decision to be made
117 Ratings
1504 Word Post
2776 Word Post
Project
Rephetio
All-Stars
Nice of you to share this big network with everyone; however, I think you need to take care not to get yourself into legal trouble here. …
I am not trying to cause trouble here — just the contrary. When making a meta-resource, licenses and copyright law are not something you can afford to ignore. I regularly leave out certain data sources from my resources for legal reasons.
One network to rule them all
We have completed an initial version of our network. …
Network existence (SHA256 checksum for graph.json.gz) is proven in Bitcoin block 369,898.
Recommendation:
release data under an open license
See also opendefinition.org
Control packages
Control OS + packages
Beaulieu-Jones & Greene (2016) bioRxiv
Beaulieu-Jones & Greene (2016) bioRxiv
At the start of this class, every pupil was asked to list 3 databases / datasets / data resources that they have used in their research.
Report progress to git.io/vPQjW
By Daniel Himmelstein
Guest lecture for the course Data Science for Biomedical Informatics (EPID 600) at the University of Pennsylvania. This course was instructed by Assistant Professor Blanca Himes. This presentation is released under a CC BY 4.0 License.
Head of Data Integration at Related Sciences. Digital craftsman of the biodata revolution.