Data Management as

Data Publication

Data After Results Publication...and Before

  • "Investigators are expected to share with other researchers, at no more than incremental cost and within a reasonable time, the primary data, samples, physical collections and other supporting materials created or gathered in the course of work under NSF grants."
    —NSF Proposal & Award Policies & Procedures Guide, C. VI D 4

  • "The final performance report must discuss the execution and any updating of the original data management plan. This discussion should describe data produced during the grant period  . . . verification that data will be available for sharing, discussion of community standards for data format, [and] the plan to disseminate the data."
    —NEH Data Management Plans for Digital Humanities Proposals and Awards, 2014

  • Furthermore, data sharing prior to the publication of major results is encouraged in many instances, for example, when data are collected to provide a resource for the scientific community (as in the case of many large surveys)."
    —NIH Data Sharing Policy, rev. February 16, 2004

Data (Not Just Results) Sharing

  • "In the future, NSF will explore whether all data underlying published findings can be made available at the time of publication . . . All data resulting from the research funded by the award, whether or not the data support a publication, should be deposited at the appropriate repository as explained in the data management plan."
    —NSF Public Access Plan, Today's Data, Tomorrow's Discoveries, March 18, 2015

  • "Data-sharing plans should encompass all data from funded research that can be shared without compromising individual subjects' rights and privacy, regardless of whether the data have been used in a publication."
    —NIH Data Sharing Policy, rev. February 16, 2004

Beyond Results Replication: Advantages to SHaring

  • Surge in interest in data by general public (e.g. NYT's Upshot, Vox, FiveThirtyEight): help build NSF's "STEM-literate society"

  • Helps push decoupling of data and front-end visualization (often via APIs), enabling a single dataset to be deployed (that is, cited) by multiple applications and by multiple partners (even within a single lab)

  • Better datasets via versioning, sharing, reworkings...and therefore better public knowledge

Publishing Data Also involves data Issues

Published data needs:

  • Metadata for findability, web cataloguing standards, and eventually integration with the semantic web

  • A data archive with a structure suited to the size and nature of the research data (i.e. relational db, distributed db)

  • The advantages offered by current forking, versioning, linking, and API technology for social data


These usually involve very different data answers than those that emerge from the original research.

Data Management Is Publishing

Authors take control of their work:

  • The default is not accessibility and sharing. It must be positively asserted by the data creator to ensure data becomes part of common knowledge

  • Even a decision to place a dataset in the public domain requires further considerations. Can derivatives be made, or only copies? Commercial uses or only non-commercial? Must users also comply with your original licensing terms?

  • Findability requires strategic planning at granular level (i.e. of metadata, data documentation, etc.)
Made with