From Planning to Sharing
Mercè Crosas, IQSS, Harvard University @mercecrosas
Every two years, the amount of new digitized data is equal to all of the data ever collected before. The world’s knowledge is at our fingertips, and data science allows us to effectively and efficiently make use of that knowledge. This is facilitating a societal shift as big as the Industrial Revolution.
Phil Bourne, Data Science Director, UVA, Former Director for Data Science, NIH; UVA Today, Q&A, August 21, 2017
Vice-Provost for Research
Data Governance
Authorship,
data citation
Data Science
Security
Storage
Computation
Repositories
Sponsored Research
Data Agreements
Software
Tools
Privacy
Research data management concerns the organization of data, from its entry to the research cycle through the dissemination and archiving of valuable results. It aims to ensure reliable verification of results, and permits new and innovative research built on existing information.
Whyte, A., Tedds, J. (2011). ‘Making the Case for Research Data Management’. DCC Briefing Papers. Edinburgh: Digital Curation Centre
Data Collection, Acquisition
Support for the data lifecycle must accommodate differences across research domains, data types, and methodologies:
Storage and
Analysis
Data Sharing and Archiving
Planning
Data Collection,
Acquisition
Storage and Analysis
Data Sharing and Archiving
Planning
https://dmptool.org/
First, rigorously collected, well-preserved data sets — including meaningful descriptors or metadata — will help the data owners to reach solid, meaningful results. Second, they will help future investigators to make sense of and reuse data, thereby enhancing utility and reproducibility. Preserving comprehensive data, ideally for many years, also reduces the risk of duplicating science done by others.
Data Collection, Acquisition
Storage and
Analysis
Data Sharing and Archiving
Planning
Data Use Agreements govern access to and treatment of data:
- provided by an outside organization to your organization for use in your organization’s research, or
- provided by your organization to an outside organization for use in its research.
Data Collection, Acquisition
Storage and Analysis
Data Sharing and Archiving
Planning
Sweeney L, Crosas M, Bar-Sinai M. Sharing Sensitive Data with Confidence: The DataTags System. Technology Science. 2015101601. October 16, 2015. http://techscience.org/a/2015101601
Data Collection, Acquisition
Storage and Analysis
Data Sharing and Archiving
Planning
https://www.nature.com/articles/sdata201618
FAIR slides acknowledgement: Michel Dumontier
http://copdess.org
Ensuring that Earth, space, and environmental science research outputs, including data, software, and samples or standard information about them, are open, FAIR, and curated in trusted domain repositories .."
A Solution for Data Sharing and Archiving
Aligned with FAIR data principles
for finding, citing, and publishing data
for building your own data repository
which facilitates data access and data sharing around the world
35 Dataverse Repositories sites around the world
2006
Dataverse Development starts at Harvard's Institute for Quantitative Social Science (IQSS)
2
Dataverse sites
2015
14
Dataverse sites
2017
23
Dataverse sites
2018
35
Dataverse sites
2016
18
Dataverse sites
First Annual Dataverse Community Meeting
4 developers
First release
74 contributors
30 releases
12,807 commits
In 2018, a new international consortium is formed to support and coordinate efforts across Dataverse Repositories.
http://dataversecommunity.global (coming soon!)
Download data citation ready to be used in reference manager
At multiple Levels:
With multiple Standards:
Download metadata in multiple formats
Metadata from schema.org in Dataverse dataset landing page
Public
Restricted
Metadata (instrument information) is extracted automatically from FITS files header upon data upload
Metadata from FITS Header
http://guides.dataverse.org
Crosas, Gautier, Karcher, Kirilova, Otalora, Schwartz, 2018. Data Policies of highly-ranked social science journals
More than 50% of the top 50 journals in anthropology, economics, psychology, and political sciences have data policies that either encourage or require to share the data associated with the article.
Hosted at Harvard Dataverse repository (80 journal dataverses)
Hosted at Harvard Dataverse repository
Hosted SBGrid Consortium, Harvard Medical School
Hosted by Texas Digital Libraries, a consortium of Texas Higher-Education Institutions
Hosted by Harvard University, in collaboration with Harvard Library, HUIT, and IQSS
http://dataverse.harvard.edu
More ways to upload data
More ways to access data:
More storages:
Funded partially by Helmsley Charitable Trust, with focus on biomedical data, in collaboration with Piotrek Sliz
Funded partially by National Science Foundation,
in collaboration with Latanya Sweeney
Standardize data security and access levels
Funded by National Science Foundation,
in collaboration with Harvard Privacy Tools Project
Funded by Sloan Foundation, in collaboration with CodeOcean
Funded by Sloan Foundation, in collaboration with Margo Seltzer
Funded by Sloan Foundation, in collaboration with the ODUM institute at UNC Chapel Hill
Funded by IMLS
dataverse.org
dataverse.harvard.edu
The Dataverse Team @IQSS
https://groups.google.com/forum/#!forum/dataverse-community
scholar.harvard.edu/mercecrosas
@mercecrosas