THE DATAVERSE PROJECT

Mercè Crosas, Institute for Quantitative Social Science, Harvard University
@mercecrosas

RDA 10th Plenary, Montreal,  September 21, 2017

Our Institute provides a technology Solution to Data Sharing  

Institute for Quantitative Social Science, Harvard University

@IQSS

An open-source software to share, cite, and find data.

Developed at Harvard's Institute for Quantitative Social Science

with the contribution of an active and growing community.

2006 (we started)

2017

dataverse.org

26 Dataverse installations serving hundreds of institutions

HOW Researchers SHare & Use data with dataverse

Harvard Dataverse Repository

A public repository for research data

 

> 70,000 datasets total
> 49,000 datasets uploaded to Harvard Dataverse repository

200 datasets/month

 

> 340,000 files

4,000 files/month

 

> 2.5 M downloads

60,000 downloads/month

Datasets Added

Downloads

dataverse.harvard.edu

King, 1995, Replication, Replication

Altman and King, 2007, A Proposed Standard for the Scholarly Citation of Quantitative Data

Altman et al, 2001, A Digital Library for the Dissemination and Replication of Quantitative Social Science

King, 2007, An Introduction to the Dataverse Network as an Infrastructure for Data Sharing

Crosas, Honaker, King, Sweeney, 2015, Automating Open Science for Big Data

Crosas, 2012, The Dataverse Network: an open source application for sharing, discovering, and preserving research data

Altman and Crosas, 2013, The Evolution to Data Citation: from principles to implementation

Crosas, 2013, A Data Sharing Story

2014, Joint Declaration of Data Citation Principles

Pepe et al, 2014, How Do  Astronomers Share Data?

Goodman et al, 2014, Ten Simple Rules for the Care and Feeding of Scientific Data

Castro et al, 2015, Achieving Human and Machine Accessibility of Cited Data

Sweeney, Crosas, Bar-Sinai, 2015, Sharing Sensitive Data with Confidence: The DataTags System

Meyer et al.  2016, Data Publication with the  Structural Biology Data Grid Supports Live Analysis

Wilkinson et al, 2016, The FAIR Guiding Principles for Scientific Data Management and Stewardship

Bierer, Crosas, Pierce, 2017, Data Authorship as an Incentive to Data Sharing

Our Contributions to Enhance data sharing

2017

Findable
Accessible
Interpoperable
Reusable

Data should be ...

Wilkinson et al. , 2016, "The FAIR Guiding Principles for Scientific Data Management and Stewardship" Nature Scientific Data

FAIR DATA in Dataverse

Data Files

Metadata

Data Licenses, User Agreements,

Restrictions

Data Citation with Persistent Identifier

Versions

APIs

A Dataverse is a container of Datasets and a Dataset is a container of data files, documentation, and code

Dataverse RICH SUPPORT FOR Data

  • Extract variable metadata from tabular data files
  • Visualize geospatial files in a map
  • Extract header metadata from FITS files
  • Reformat

Dataverse customization and Branding

Dataverse integration with journals

Data citation from article to data, review workflow, replication code

What are we working on NOW?

Data Provenance

track the original source of a Dataset

Pasquier, Lau, Trisovic, Boose, Coutierer, Crosas, Ellison, GIbson, Jones, Seltzer, 2017, If These Data Could Talk, Nature Scientific Data (Data Provenance examples from CERN and Harvard Forest)

ClouD Dataverse

Combine data repositories with Cloud computing

StrUctural Biology Data Bank

Data Privacy

classify and handle datasets based on Their privacy level

Harvard Data Privacy Tools Project: privacytools.seas.harvard.edu

DataTags Project: datatags.org

INTEGRATION WITH TOOLS

Dataverse as part of the data lifecycle

 

Dataverse  Community

 

 

380 Members in OUR Community group

https://groups.google.com/forum/#!members/dataverse-community

BI-WEEKLy Community Calls

 

235 ATTENDEES
26 ORGANIZATIONS/UNIVERSITIES
11 countries

49 software contributors

AnNual Community Meeting

Next: June 13, 14, 15, 2018

Text

Thanks

@mercecrosas

scholar.harvard.edu/mercecrosas

dataverse.org

RDA 10th - Repositories Session

By Mercè Crosas

RDA 10th - Repositories Session

  • 1,794