Copyright versus open science: a story of data integration

November 15, 2015

Brussels, Belgium

dhimmel on:


—Daniel Himmelstein

1. copyright

limited by:

  • fair use

  • originality (excludes facts)

2. contract

agreement entered into to receive access to a resource

  • can impose restrictions beyond copyright

restrictions on data

automatically granted to "original works of authorship" giving the exclusive right to:

  • copy​​

  • distribute

  • create derivatives

Chia-Jung Tsay

Samuel Mehr

Copyright prevents reproducible science

Sources: WaPo & Gaurdian

2013: Finds people use sight over sound for scoring music competitions

Study based on 6-second clips from 10 YouTube videos

Interested researchers cannot replicate her findings using different clips

3 of the original clips are no longer online

Time: 18 months

Result: Tsay claims she cannot provide the 3 removed videos due to copyright law 

Network for drug repurposing

  • 50k nodes
    10 types
  • 3M edges
    27 types
  • 28 public resources
    DOI: 10.15363/thinklab.4
  • open for reuse & reproducibility

1. ∅ license

3. ∅ distribute

  • MSigDB — publicly-funded project from the Broad
  • publication data supplements


4. standard

  • 9 resources
  • all rights reserved
  • upon contact:
    • 1 permission
    • 0 licenses added

2. unclear

  • 4 resources
  • clarification after laborious and slow permission requests
  • 11 resources
  • incompatibilities

5. government

  • 4 resources
  • public domain

Resolution after months & 5000+ word discussion: mixed approach


release data as CC0

(public domain)

Copyright versus open science: a story of data integration

By Daniel Himmelstein

Copyright versus open science: a story of data integration

WATCH ONLINE at OpenCon. ABSTRACT: We recently created a network for drug repurposing with 3 million edges. Creating the network brought together 27 collaborators who communicated via 266 CC-BY posts. The network integrates data from 28 public resources. However, each source imposes it’s own (often incompatible) restrictions, implicitly by copyright or explicitly by licensing. We’ve contacted 10 resources, with only a single affirmative response. We’re currently exposing the harms of transferring data copyright to publishers as well as Universities that seek to profit from publicly-funded databases while preventing reuse.

  • 2,848