UROP@CECS workshop: Future Scientist Trends

Daniel Himmelstein (@dhimmel)

2022-12-09 11:00

Vin University, Hanoi, Vietnam

College of Engineering and Computer Science

TBL - C305, Building C


slides released under CC BY 4.0

How to Become a Modern Open Scientist

event information (tweet)

Event ifno:

Within the scope of Undergraduate Research Opportunities Program (UROP), we are pleased to invite you to UROP@CECS workshop Future Scientist Trends.


Date & time: Friday, 9 December 2022 | 11:00am – 12:50pm
Venue:  TBL - C305, Building C, VinUni campus

College of Engineering and Computer Science


During his PhD studies at University of California, San Francisco, Daniel saw opportunities to increase scientific progress by adopting open and collaborative practices. He began efforts to reduce delays at scholarly journals, ensure publications are public & reusable rather than behind paywalls, and to make the process of science open to benefit from real-time global collaboration. Daniel will discuss how open science helped his career and how students at VinUniversity can apply similar techniques such that they will graduate with a public record of scientific contribution appreciated around the world!


Daniel is Head of Data Integration at Related Sciences.

leave no trace

leave negative trace

when enjoying nature

maximize your trace

when doing science


2011, began PhD

& made my code public on GitHub

GitHub contribution heatmap for @dhimmel

git log \
  --pretty=short \


deep review contribution history

33 affiliations

the questions begin


online discussion contributions
(see thinklab.com/p/rephetio/leaderboard)

Visualizing Hetionet v1.0

  • Hetnet of biology for drug repurposing
  • ~50 thousand nodes
    11 types (labels)
  • ~2.25 million relationships
    24 types
  • integrates 29 public resources
    knowledge from millions of studies

Hetionet v1.0

Sci-Hub versus Penn Libraries

  • Penn Libraries spent $13.13 million on electronic resources in 2017
  • Average per-download cost of $1.61
  • 326 toll access articles (manually checked)
    • Penn's access: 80.7%
    • Sci-Hub's database: 94.2%


Publishing Delays

Illustration by Matt Murphy.

© Nature News doi.org/f3mn4t











files & directories

ISO 8601



how to name files?


Nice of you to share this big network with everyone; however, I think you need to take care not to get yourself into legal trouble here. … 

I am not trying to cause trouble here — just the contrary. When making a meta-resource, licenses and copyright law are not something you can afford to ignore. I regularly leave out certain data sources from my resources for legal reasons.

One network to rule them all

We have completed an initial version of our network. …

Network existence (SHA256 checksum for graph.json.gz) is proven in Bitcoin block 369,898.

Discussion DOIs: bfmkbfmmbfmnbfmp

  • Hetionet (≤ v1.0) integrated data from 31 resources:
    • 5 United States Government works
    • 12 openly licensed
    • 4 non-commercial use only
    • 9 were all rights reserved
    • 1 explicitly & contractually forbid reuse
  • Requested permission for 11 resources:
    • median time to first response was 16 days
    • 2 affirmative responses
  • Other considerations:
    • who owns data
    • incompatibilities: share alike vs non-commercial
    • copyright status of data & fair use
  • Solution: license attribute per node/relationship

Legal barriers to data reuse

by default, scientific outputs subject to copyright

sometimes universities place additional legal barriers to reuse 


  1. release data under an open license
  2. University researchers: commit to open in your resource sharing plan

VinUni: How to Become a Modern Open Scientist

By Daniel Himmelstein

VinUni: How to Become a Modern Open Scientist

Presentation by Daniel Himmelstein at VinUniversity in Hanoi, Vietnam. This presentation is released under a CC BY 4.0 License.

  • 544