UROP@CECS workshop: Future Scientist Trends

Daniel Himmelstein (@dhimmel)

2022-12-09 11:00

Vin University, Hanoi, Vietnam

College of Engineering and Computer Science

TBL - C305, Building C

slides.com/dhimmel/vinuni

slides released under CC BY 4.0

How to Become a Modern Open Scientist

event information (tweet)

Event ifno:

Within the scope of Undergraduate Research Opportunities Program (UROP), we are pleased to invite you to UROP@CECS workshop Future Scientist Trends.

 

Date & time: Friday, 9 December 2022 | 11:00am – 12:50pm
Venue:  TBL - C305, Building C, VinUni campus

College of Engineering and Computer Science


Abstract:

During his PhD studies at University of California, San Francisco, Daniel saw opportunities to increase scientific progress by adopting open and collaborative practices. He began efforts to reduce delays at scholarly journals, ensure publications are public & reusable rather than behind paywalls, and to make the process of science open to benefit from real-time global collaboration. Daniel will discuss how open science helped his career and how students at VinUniversity can apply similar techniques such that they will graduate with a public record of scientific contribution appreciated around the world!

 

Bio:
Daniel is Head of Data Integration at Related Sciences.

leave no trace

leave negative trace

when enjoying nature

maximize your trace

when doing science

https://github.com/Rezmason/matrix

2011, began PhD

& made my code public on GitHub

GitHub contribution heatmap for @dhimmel

git log \
  --pretty=short \
  --abbrev-commit

https://github.com/greenelab/deep-review

deep review contribution history

33 affiliations

the questions begin

https://manubot.org/catalog/

online discussion contributions
(see thinklab.com/p/rephetio/leaderboard)

Visualizing Hetionet v1.0

  • Hetnet of biology for drug repurposing
     
  • ~50 thousand nodes
    11 types (labels)
     
  • ~2.25 million relationships
    24 types
     
  • integrates 29 public resources
    knowledge from millions of studies

Hetionet v1.0

Sci-Hub versus Penn Libraries

  • Penn Libraries spent $13.13 million on electronic resources in 2017
  • Average per-download cost of $1.61
  • 326 toll access articles (manually checked)
    • Penn's access: 80.7%
    • Sci-Hub's database: 94.2%

https://github.com/greenelab/library-access

Publishing Delays

Illustration by Matt Murphy.

© Nature News doi.org/f3mn4t

submission

acceptance

acceptance

publication

https://blog.dhimmel.com/history-of-delays/

https://blog.dhimmel.com/plos-and-publishing-delays

Thanks!

@dhimmel

0000-0002-3012-7446

Slides
https://slides.com/dhimmel/vinuni

files & directories

ISO 8601

2019-08-22

2019-08-22_bgs-presentation.pdf
2019-08-23_all-of-my-secrets.txt

how to name files?

01.download-data.ipynb
02.process-data.ipynb
03.visualize-data.ipynb

Nice of you to share this big network with everyone; however, I think you need to take care not to get yourself into legal trouble here. … 

I am not trying to cause trouble here — just the contrary. When making a meta-resource, licenses and copyright law are not something you can afford to ignore. I regularly leave out certain data sources from my resources for legal reasons.

One network to rule them all

We have completed an initial version of our network. …

Network existence (SHA256 checksum for graph.json.gz) is proven in Bitcoin block 369,898.

Discussion DOIs: bfmkbfmmbfmnbfmp

  • Hetionet (≤ v1.0) integrated data from 31 resources:
    • 5 United States Government works
    • 12 openly licensed
    • 4 non-commercial use only
    • 9 were all rights reserved
    • 1 explicitly & contractually forbid reuse
  • Requested permission for 11 resources:
    • median time to first response was 16 days
    • 2 affirmative responses
  • Other considerations:
    • who owns data
    • incompatibilities: share alike vs non-commercial
    • copyright status of data & fair use
  • Solution: license attribute per node/relationship

Legal barriers to data reuse

by default, scientific outputs subject to copyright

sometimes universities place additional legal barriers to reuse 

Recommendations:

  1. release data under an open license
  2. University researchers: commit to open in your resource sharing plan
Made with Slides.com