Moore Lab at Dartmouth

August 2011

Jesse's Tavern

Hanover, NH

Licensing for the biodata scientist. Presentation to the Moore Lab.

Greene Lab

I'm a data scientist


  • Original works of authorship
  • University may be the owner 
  • Life of author plus at least 70 years
  • applies to manuscripts, (some) data, and code

Open Access & the licensing of manuscripts

Does copyright support continued availability and distribution?

From How Copyright Keeps Works Disappeared.

Dissapearing decades: Amazon titles by decade

  • Works published prior to 1923 are public domain

Open access citation advantage

McKiernan et al. (2016) eLife

What if your work is submitted to a subscription journal?

  • Irreversible forfeiture of your intellectual property in dereliction of the public interest occurs when you sign the:
    • Copyright Transfer Agreement
    • License to Publish
  • Workarounds:
    • Preempt by publishing a preprint under an open license
    • Attempt to modify the agreement using the following techniques.

SPARC Addendum

Modify the copyright transfer agreement


Publishing Delays





March 26, 2015: my paper on Heterogeneous Network Edge Prediction is accepted to PLOS Computational Biology.





⌛⌛⌛⌛⌛⌛⌛             ⌛⌛⌛⌛

68 days

Time from submission to acceptance for 3,330,333 articles since 1965

The history of publishing delays

Source: ASAPbio & PrePubMed

Biology preprints per month

bioRxiv licenses by subject area

bioRxiv licenses over time

Creative Commons Licenses

Text & Data Mining

  • The total amount of literature is growing exponentially
  • Increasingly machines are reading the literature
  • Machines are in most cases restricted to the open access subset of the literature.
  • Machines lead to inlinks and citations
  • Hetnet of biology designed for drug repurposing
  • ~50 thousand nodes
    11 types (labels)
  • ~2.25 million relationships
    24 types
  • integrates 29 public resources
    knowledge from millions of studies
  • Use at
  • Predicted probability of treatment for ~200,000 compound-disease pairs (

Hetionet v1.0

Visualizing Hetionet v1.0

Nice of you to share this big network with everyone; however, I think you need to take care not to get yourself into legal trouble here. … 

I am not trying to cause trouble here — just the contrary. When making a meta-resource, licenses and copyright law are not something you can afford to ignore. I regularly leave out certain data sources from my resources for legal reasons.

One network to rule them all

We have completed an initial version of our network. …

Network existence (SHA256 checksum for graph.json.gz) is proven in Bitcoin block 369,898.

Discussion DOIs: bfmkbfmmbfmnbfmp

  • Hetionet integrates data from 29 resources
  • 12 had an open license
  • 9 had no license
  • Incompatibilities - Share Alike vs Non-Commercial
  • Requested permission for 11 resources
  • Median time to first reponse was 16 days
  • 2 affirmative responses
  • Removed MSigDB
  • "LICENSEE agrees not to put … the DATABASE on a … server … that may be accessed by any individual other than the LICENSEE."
  • LICENSEE agrees to provide … a written evaluation of the PROGRAM and the DATABASE, including a description of its functionality or problems and areas for further improvement

Legal barriers to data reuse


release data under an open license

Open Source
the licensing of software

What happens if your software doesn't have a license?

When you make a creative work (which includes code), the work is under exclusive copyright by default. Unless you include a license that specifies otherwise, nobody else can use, copy, distribute, or modify your work without being at risk of take-downs, shake-downs, or litigation. Once the work has other contributors (each a copyright holder), “nobody” starts including you.


  • permissive license
  • first commit

Resource Sharing Plan

All outputs from this project — including code, data, figures, documentation, and manuscripts — will be made publicly available under an open license within two years of the end of the award. Code will be released under a BSD 3-Clause License, a permissive open source software license. Data will be released under the Creative Commons Public Domain Dedication (CC0, version 1.0 or later). Figures, documentation, and writing will be released under a Creative Commons Attribution License (version 4.0 or later).

In addition to the aforementioned licensing for project outputs, creators of specific project content may release any such content as CC0, at their individual discretion. The principal investigator of this project may release any project content as CC0, at his or her individual discretion.

In instances where upstream inputs are used that restrict the licensing of project outputs beyond the aforementioned guidelines, the most permissive licensing option possible will be applied. However, no inputs will be incorporated that prevent original software from being released under an Open Source Initiative ( approved license or prevent original non-code content from being released under an Open Definition ( conformant license.

Source code will be made available on a publicly accessible version control system, such as GitHub. Prior to submission of project manuscripts to a journal, all related outputs will be made publicly available under the aforementioned licensing guidelines and deposited to persistent archives. Currently, the group uses Zenodo for code repositories, figshare for datasets, and bioRxiv for preprints, however the group may transition to alternatives if other options become more suitable during the course of the grant.


What about repositories that mix content, code, & data?

See also

Dual license



Licensing for the biodata scientist: Presentation to the Moore Lab

By Daniel Himmelstein

Licensing for the biodata scientist: Presentation to the Moore Lab

Presentation to the Moore Lab at Penn ( on April 10, 2017 at 12:00 pm in Richards room 309. This presentation is released under a CC BY 4.0 License.

  • 3,047