Dartmouth Libraries Research Facilitation Lightning Talks

2025-11-20

 

Daniel Himmelstein

Trang Le

https://slides.com/dhimmel/dartlib

slides released under CC BY 4.0

Sci-Hub versus Penn Libraries

  • Penn Libraries spent $13.13 million on electronic resources in 2017
  • Average per-download cost of $1.61
  • 326 toll access articles (manually checked)
    • Penn's access: 80.7%
    • Sci-Hub's database: 94.2%

https://github.com/greenelab/library-access

https://manubot.org

deep review contribution history

citation by persistent identifier

This is a sentence with 5 citations [
  @doi:10.1038/nbt.3780;
  @pmid:29424689;
  @pmcid:PMC5938574;
  @arxiv:1407.3561;
  @url:https://greenelab.github.io/meta-review/
].

References

  1. Reproducibility of computational workflows is automated using continuous analysis
    Brett K Beaulieu-Jones, Casey S Greene
    Nature Biotechnology (2017-03-13) https://doi.org/f9ttx6
    DOI: 10.1038/nbt.3780 · PMID: 28288103 · PMCID: PMC6103790
     
  2. Sci-Hub provides access to nearly all scholarly literature.
    Daniel S Himmelstein, Ariel Rodriguez Romero, Jacob G Levernier, Thomas Anthony Munro, Stephen Reid McLaughlin, Bastian Greshake Tzovaras, Casey S Greene
    eLife (2018-03-01) https://www.ncbi.nlm.nih.gov/pubmed/29424689
    DOI: 10.7554/elife.32822 · PMID: 29424689 · PMCID: PMC5832410
     
  3. Opportunities and obstacles for deep learning in biology and medicine
    Travers Ching, Daniel S. Himmelstein, Brett K. Beaulieu-Jones, Alexandr A. Kalinin, Brian T. Do, Gregory P. Way, Enrico Ferrero, Paul-Michael Agapow, Michael Zietz, Michael M. Hoffman, … Casey S. Greene
    Journal of the Royal Society Interface (2018-04) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5938574/
    DOI: 10.1098/rsif.2017.0387 · PMID: 29618526 · PMCID: PMC5938574
     
  4. IPFS - Content Addressed, Versioned, P2P File System
    Juan Benet
    arXiv (2014-07-14) https://arxiv.org/abs/1407.3561v1
     
  5. Open collaborative writing with Manubot
    Daniel S. Himmelstein, David R. Slochower, Venkat S. Malladi, Casey S. Greene, Anthony Gitter
    (2018-08-03) https://greenelab.github.io/meta-review/
This is a sentence with 5 citations [1,2,3,4,5].

https://manubot.org/catalog/

  • knowledge graph for drug repurposing
     
  • integrates 29 public resources
    knowledge from millions of studies
     
  • ~50 thousand nodes
    11 types (labels)
     
  • ~2.25 million relationships
    24 types

Hetionet v1.0

Systematic integration of biomedical knowledge prioritizes drugs for repurposing
Daniel S Himmelstein, Antoine Lizee, Christine Hessler, Leo Brueggeman, Sabrina L Chen, Dexter Hadley, Ari Green, Pouya Khankhanian, Sergio E Baranzini
eLife (2017) https://doi.org/cdfk

Nice of you to share this big network with everyone; however, I think you need to take care not to get yourself into legal trouble here. … 

I am not trying to cause trouble here — just the contrary. When making a meta-resource, licenses and copyright law are not something you can afford to ignore. I regularly leave out certain data sources from my resources for legal reasons.

One network to rule them all

We have completed an initial version of our network. …

Network existence (SHA256 checksum for graph.json.gz) is proven in Bitcoin block 369,898.

Discussion DOIs: bfmkbfmmbfmnbfmp

  • Hetionet (≤ v1.0) integrated data from 31 resources:
    • 5 United States Government works
    • 12 openly licensed
    • 4 non-commercial use only
    • 9 were all rights reserved
    • 1 explicitly & contractually forbid reuse
  • Requested permission for 11 resources:
    • median time to first response was 16 days
    • 2 affirmative responses
  • Other considerations:
    • who owns data
    • incompatibilities: share alike vs non-commercial
    • copyright status of data & fair use
  • Solution: license attribute per node/relationship

Legal barriers to data reuse

by default, scientific outputs subject to copyright

sometimes universities place additional legal barriers to reuse 

Recommendations:

  1. release data under an open license
  2. University researchers: commit to open in your resource sharing plan

OpenAlex

a fully open catalog of the global research system

oa_fetch(
    entity = "works",
    institutions.ror = "049s0rh22", # Dartmouth college
    type = "article",
    group_by = "primary_topic.field.id",
    publication_year = 2020,
    is_oa = TRUE,
  ) 
  • Which journals publish Dartmouth research most frequently—and what are their OA policies?
  • Which Dartmouth departments are gaining the fastest citation momentum in the last 5 years?
  • Can we auto-generate a list of NSF-funded publications for a PI's annual report?
  • How quickly Dartmouth work is entering “planetary health” or “computational social science”?
  • How does Dartmouth’s OA percentage compare to peer institutions?

This is an excellent story on all counts but two things stood out to me. 1) the “ski-rose” is an example of a complex visualization that is so well-served by scrollytelling. 2) this scrolly at openskistats.org sits alongside a technical manuscript and beautiful interactive data table, showing how different formats can serve complementary roles in communicating insights about data.

Andrew Bray:

Parambulations

Dartmouth Libraries Research Facilitation Lightning Talks on 2025-11-20

By Daniel Himmelstein

Dartmouth Libraries Research Facilitation Lightning Talks on 2025-11-20

  • 5