#OPENSCIENCE: The Impact of Scientific Research through Open Resources

Daniel Himmelstein (@dhimmel)

Penn Libraries Workshop

Van Pelt-Dietrich Library Center

Class of 1955 Conf. Room, 2nd Floor, VPDLC

October 23, 2019 1:30pm


slides released under CC BY 4.0

Next generation scholarly communication:
advancing the conversation.

event information (website)

Event title:

​#OPENSCIENCE: The Impact of Scientific Research through Open Resources


#OPENSCIENCE is a collaborative effort to improve open access to scientific research through multimodal platforms. Open scientific research is in demand not just for those who do not have access, but also creators and researchers who believe in the potential it has to offer. Please join us for a series of presentations to learn more about the evolution of scientific research and open access.

Our diverse panel of speakers will feature: Theodore Satterthwaite M.D. Assistant Professor Department of Psychiatry, University of Pennsylvania School of Medicine; Daniel Himmelstein, Postdoctoral Fellow at Greene Lab, Department of Systems Pharmacology & Translational Therapeutics, University of Pennsylvania; and Jennifer Stiso, PhD Candidate for Neuroscience, University of Pennsylvania School of Medicine.

Each speaker will give a short presentation, which will be followed by a Q&A session with all three speakers. We look forward to your attendance and participation!


Daniel Himmelstein is a postdoctoral fellow in the Greene Lab at the University of Pennsylvania. Previously, he received his PhD from the University of California San Francisco. His research focuses on integrating biomedical knowledge using networks. Daniel is also a frequent contributor to open source/data ecosystems, and explores how computational research can become more open and reproducible. He is the lead developer of the Manubot, a tool for open, collaborative writing of scholarly manuscripts on GitHub.

most viewed bioRxiv preprint of 2017

33 affiliations


deep review contribution history

the questions begin

where is the conversation?

~10% of bioRxiv preprints have comments.
Source: Inglis & Sever 2016. https://asapbio.org/biorxiv

Using PeerJ's comment feature to flag an error

hypothes.is journal integration



  1. integrate conversations from multiple locations
  2. all studies should have a conversation venue
  3. incentivize public conversation
  4. all conversation enters the scientific record

online discussion contributions
(see thinklab.com/p/rephetio/leaderboard)

Visualizing Hetionet v1.0

  • Hetnet of biology for drug repurposing
  • ~50 thousand nodes
    11 types (labels)
  • ~2.25 million relationships
    24 types
  • integrates 29 public resources
    knowledge from millions of studies

Hetionet v1.0

Nice of you to share this big network with everyone; however, I think you need to take care not to get yourself into legal trouble here. … 

I am not trying to cause trouble here — just the contrary. When making a meta-resource, licenses and copyright law are not something you can afford to ignore. I regularly leave out certain data sources from my resources for legal reasons.

One network to rule them all

We have completed an initial version of our network. …

Network existence (SHA256 checksum for graph.json.gz) is proven in Bitcoin block 369,898.

Discussion DOIs: bfmkbfmmbfmnbfmp

  • Hetionet (≤ v1.0) integrated data from 31 resources:
    • 5 United States Government works
    • 12 openly licensed
    • 4 non-commercial use only
    • 9 were all rights reserved
    • 1 explicitly & contractually forbid reuse
  • Requested permission for 11 resources:
    • median time to first response was 16 days
    • 2 affirmative responses
  • Other considerations:
    • who owns data
    • incompatibilities: share alike vs non-commercial
    • copyright status of data & fair use
  • Solution: license attribute per node/relationship

Legal barriers to data reuse

by default, scientific outputs subject to copyright

sometimes universities place additional legal barriers to reuse 


  1. release data under an open license
  2. University researchers: commit to open in your resource sharing plan

citation by persistent identifier

This is a sentence with 5 citations [


  1. Reproducibility of computational workflows is automated using continuous analysis
    Brett K Beaulieu-Jones, Casey S Greene
    Nature Biotechnology (2017-03-13) https://doi.org/f9ttx6
    DOI: 10.1038/nbt.3780 · PMID: 28288103 · PMCID: PMC6103790
  2. Sci-Hub provides access to nearly all scholarly literature.
    Daniel S Himmelstein, Ariel Rodriguez Romero, Jacob G Levernier, Thomas Anthony Munro, Stephen Reid McLaughlin, Bastian Greshake Tzovaras, Casey S Greene
    eLife (2018-03-01) https://www.ncbi.nlm.nih.gov/pubmed/29424689
    DOI: 10.7554/elife.32822 · PMID: 29424689 · PMCID: PMC5832410
  3. Opportunities and obstacles for deep learning in biology and medicine
    Travers Ching, Daniel S. Himmelstein, Brett K. Beaulieu-Jones, Alexandr A. Kalinin, Brian T. Do, Gregory P. Way, Enrico Ferrero, Paul-Michael Agapow, Michael Zietz, Michael M. Hoffman, … Casey S. Greene
    Journal of the Royal Society Interface (2018-04) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5938574/
    DOI: 10.1098/rsif.2017.0387 · PMID: 29618526 · PMCID: PMC5938574
  4. IPFS - Content Addressed, Versioned, P2P File System
    Juan Benet
    arXiv (2014-07-14) https://arxiv.org/abs/1407.3561v1
  5. Open collaborative writing with Manubot
    Daniel S. Himmelstein, David R. Slochower, Venkat S. Malladi, Casey S. Greene, Anthony Gitter
    (2018-08-03) https://greenelab.github.io/meta-review/
This is a sentence with 5 citations [1,2,3,4,5].






Next generation scholarly communication: advancing the conversation

By Daniel Himmelstein

Next generation scholarly communication: advancing the conversation

Presentation by Daniel Himmelstein at Penn Libraries' panel on Open Science on 2019-10-23 for the event titled "#OPENSCIENCE: The Impact of Scientific Research through Open Resources". This presentation is released under a CC BY 4.0 License.

  • 4,441