THE DATAVERSE PROJECT
Mercè Crosas, Institute for Quantitative Social Science, Harvard University
@mercecrosas
RDA 10th Plenary, Montreal, September 21, 2017
Our Institute provides a technology Solution to Data Sharing
Institute for Quantitative Social Science, Harvard University
@IQSS
An open-source software to share, cite, and find data.
Developed at Harvard's Institute for Quantitative Social Science
with the contribution of an active and growing community.
2006 (we started)
2017
dataverse.org
26 Dataverse installations serving hundreds of institutions
HOW Researchers SHare & Use data with dataverse
Harvard Dataverse Repository
A public repository for research data
> 70,000 datasets total
> 49,000 datasets uploaded to Harvard Dataverse repository
200 datasets/month
> 340,000 files
4,000 files/month
> 2.5 M downloads
60,000 downloads/month
Datasets Added
Downloads
dataverse.harvard.edu
King, 1995, Replication, Replication
Altman and King, 2007, A Proposed Standard for the Scholarly Citation of Quantitative Data
Altman et al, 2001, A Digital Library for the Dissemination and Replication of Quantitative Social Science
King, 2007, An Introduction to the Dataverse Network as an Infrastructure for Data Sharing
Crosas, Honaker, King, Sweeney, 2015, Automating Open Science for Big Data
Crosas, 2012, The Dataverse Network: an open source application for sharing, discovering, and preserving research data
Altman and Crosas, 2013, The Evolution to Data Citation: from principles to implementation
Crosas, 2013, A Data Sharing Story
2014, Joint Declaration of Data Citation Principles
Pepe et al, 2014, How Do Astronomers Share Data?
Goodman et al, 2014, Ten Simple Rules for the Care and Feeding of Scientific Data
Castro et al, 2015, Achieving Human and Machine Accessibility of Cited Data
Sweeney, Crosas, Bar-Sinai, 2015, Sharing Sensitive Data with Confidence: The DataTags System
Meyer et al. 2016, Data Publication with the Structural Biology Data Grid Supports Live Analysis
Wilkinson et al, 2016, The FAIR Guiding Principles for Scientific Data Management and Stewardship
Bierer, Crosas, Pierce, 2017, Data Authorship as an Incentive to Data Sharing
Our Contributions to Enhance data sharing
2017
Findable
Accessible
Interpoperable
Reusable
Data should be ...
Wilkinson et al. , 2016, "The FAIR Guiding Principles for Scientific Data Management and Stewardship" Nature Scientific Data
FAIR DATA in Dataverse
Data Files
Metadata
Data Licenses, User Agreements,
Restrictions
Data Citation with Persistent Identifier
Versions
APIs
A Dataverse is a container of Datasets and a Dataset is a container of data files, documentation, and code
Dataverse RICH SUPPORT FOR Data
- Extract variable metadata from tabular data files
- Visualize geospatial files in a map
- Extract header metadata from FITS files
- Reformat
Dataverse customization and Branding
Dataverse integration with journals
Data citation from article to data, review workflow, replication code
What are we working on NOW?
Data Provenance
track the original source of a Dataset
Pasquier, Lau, Trisovic, Boose, Coutierer, Crosas, Ellison, GIbson, Jones, Seltzer, 2017, If These Data Could Talk, Nature Scientific Data (Data Provenance examples from CERN and Harvard Forest)
ClouD Dataverse
Combine data repositories with Cloud computing
StrUctural Biology Data Bank
Data Privacy
classify and handle datasets based on Their privacy level
Harvard Data Privacy Tools Project: privacytools.seas.harvard.edu
DataTags Project: datatags.org
INTEGRATION WITH TOOLS
Dataverse as part of the data lifecycle
Dataverse Community
380 Members in OUR Community group
https://groups.google.com/forum/#!members/dataverse-community
BI-WEEKLy Community Calls
235 ATTENDEES
26 ORGANIZATIONS/UNIVERSITIES
11 countries
49 software contributors
AnNual Community Meeting
Next: June 13, 14, 15, 2018
Text
Thanks
@mercecrosas
scholar.harvard.edu/mercecrosas
dataverse.org
RDA 10th - Repositories Session
By Mercè Crosas
RDA 10th - Repositories Session
- 1,764