Open sourceror. Digital craftsman of the biodata revolution.
Moore Lab at Dartmouth
Licensing for the biodata scientist. Presentation to the Moore Lab.
- Original works of authorship
- University may be the owner
- Life of author plus at least 70 years
- applies to manuscripts, (some) data, and code
Open Access & the licensing of manuscripts
Does copyright support continued availability and distribution?
From How Copyright Keeps Works Disappeared. https://doi.org/b5gf
Dissapearing decades: Amazon titles by decade
- Works published prior to 1923 are public domain
Open access citation advantage
McKiernan et al. (2016) eLife
What if your work is submitted to a subscription journal?
- Irreversible forfeiture of your intellectual property in dereliction of the public interest occurs when you sign the:
- Copyright Transfer Agreement
- License to Publish
- Preempt by publishing a preprint under an open license
- Attempt to modify the agreement using the following techniques.
Modify the copyright transfer agreement
March 26, 2015: my paper on Heterogeneous Network Edge Prediction is accepted to PLOS Computational Biology.
Time from submission to acceptance for 3,330,333 articles since 1965
The history of publishing delays
Source: ASAPbio & PrePubMed
Biology preprints per month
bioRxiv licenses by subject area
bioRxiv licenses over time
Creative Commons Licenses
Also see opendefinition.org
Text & Data Mining
- The total amount of literature is growing exponentially
- Increasingly machines are reading the literature
- Machines are in most cases restricted to the open access subset of the literature.
- Machines lead to inlinks and citations
Visualizing Hetionet v1.0
Nice of you to share this big network with everyone; however, I think you need to take care not to get yourself into legal trouble here. …
I am not trying to cause trouble here — just the contrary. When making a meta-resource, licenses and copyright law are not something you can afford to ignore. I regularly leave out certain data sources from my resources for legal reasons.
One network to rule them all
We have completed an initial version of our network. …
Network existence (SHA256 checksum for graph.json.gz) is proven in Bitcoin block 369,898.
- Hetionet integrates data from 29 resources
- 12 had an open license
- 9 had no license
Incompatibilities - Share Alike vs Non-Commercial
- Requested permission for 11 resources
- Median time to first reponse was 16 days
2 affirmative responses
- Removed MSigDB
- "LICENSEE agrees not to put … the DATABASE on a … server … that may be accessed by any individual other than the LICENSEE."
- LICENSEE agrees to provide … a written evaluation of the PROGRAM and the DATABASE, including a description of its functionality or problems and areas for further improvement
Legal barriers to data reuse
release data under an open license
the licensing of software
What happens if your software doesn't have a license?
When you make a creative work (which includes code), the work is under exclusive copyright by default. Unless you include a license that specifies otherwise, nobody else can use, copy, distribute, or modify your work without being at risk of take-downs, shake-downs, or litigation. Once the work has other contributors (each a copyright holder), “nobody” starts including you.
- permissive license
- first commit
Resource Sharing Plan
All outputs from this project — including code, data, figures, documentation, and manuscripts — will be made publicly available under an open license within two years of the end of the award. Code will be released under a BSD 3-Clause License, a permissive open source software license. Data will be released under the Creative Commons Public Domain Dedication (CC0, version 1.0 or later). Figures, documentation, and writing will be released under a Creative Commons Attribution License (version 4.0 or later).
In addition to the aforementioned licensing for project outputs, creators of specific project content may release any such content as CC0, at their individual discretion. The principal investigator of this project may release any project content as CC0, at his or her individual discretion.
In instances where upstream inputs are used that restrict the licensing of project outputs beyond the aforementioned guidelines, the most permissive licensing option possible will be applied. However, no inputs will be incorporated that prevent original software from being released under an Open Source Initiative (opensource.org) approved license or prevent original non-code content from being released under an Open Definition (opendefinition.org) conformant license.
Source code will be made available on a publicly accessible version control system, such as GitHub. Prior to submission of project manuscripts to a journal, all related outputs will be made publicly available under the aforementioned licensing guidelines and deposited to persistent archives. Currently, the group uses Zenodo for code repositories, figshare for datasets, and bioRxiv for preprints, however the group may transition to alternatives if other options become more suitable during the course of the grant.
What about repositories that mix content, code, & data?
See also git.io/vS6os
Licensing for the biodata scientist: Presentation to the Moore Lab
By Daniel Himmelstein