Welcome to the Dataverse Community Meeting

Dataverse Cup 2018









33 dataverse installations since 2006

+10 new installations since last Community Meeting

Dataverse Google Groups Members 

+ 128 members since last Community Meeting



+ 265 topics since last Community Meeting


GITHUb Dataverse REPO

65 contributors since 2013

+  22 since last Community Meeting

632 pull requests since 2013

+ 298 since last Community Meeting


11,203 commits since 2013

+  2,868 since last Community Meeting


Dataverse TwiTTEr Followers

 4,114 followers since 2012

+  467 since last Community Meeting





Since last Community Meeting:

21 Community Calls

180 participants


Since last Community Meeting:

10,463 messages

358 unique users

Prior Year (June 2016-June 2017):

7, 114 messages

245 unique users


Since last Community Meeting:

 20 sprints

223 standup meetings

156,165 Slack messages

964 support tickets


The Future



A growing community needs to become self-organized and leverage economies of Scale

what should we Pay attention to?

RESEARCH Data are becoming more Complex: large-scale, Streaming, Sensitive

Local, National, International Data Platforms Are BEing Built on the CLOUD 

  • NIH Data Commons, with AWS, Google Cloud, MS Azure:

NIH Data  Commons pilot phase explores using the cloud to access and share FAIR biomedical data


  • European Open Science Cloud, with open source clouds 

The EOSC will allow for universal access to data and new level playing field for EU researchers


  • ​Massachusetts Open Cloud, built on OpenStack

It will serve as a marketplace for industry partners as well as a place for researchers and industry to innovate and expose innovation to real users.

Data Citation, Reuse, and replication ARe growing, but slowly

Snapshot of the current state of Data citation

Garza, K., Fenner, M., DataCite Blog,  June 2018

Out of the 22,000 links provided via Crossref DOIs, only 16% or 3,657 are links between literature and data.


But 40% increase in data citations (from 2,599 to 3,657) between March 2017 and March 2018.

Data Policies for Highly-Ranked social Science Journals

Crosas M, Gautier J, Karcher S, Kirilova D, Otalora G, Schwartz A, SocArxiv, March 2018

Does Not Have Data Policy

Has Data Policy

More than half of the journals have a data policy (except in History)

Data Policies for Highly-Ranked social Science Journals

Crosas M, Gautier J, Karcher S, Kirilova D, Otalora G, Schwartz A, SocArxiv, March 2018

No Data Policy

Encourage  Data Sharing

Require  Data Sharing

Economics, Political Science, and Psychology have higher # of  journals requiring data sharing

Personal data Need to be protected


  • Right of access, of rectification, to be forgotten, etc
  • Informed consent as basis of use personal data


  • Facebook will provide privacy-preserving data and access (through Dataverse)
  • Seven nonprofit foundations will fund the research
  • An eight will oversee the peer review process


what does this all mean?

Dataverse must be ready to :

  • Provide more options for data deposit, storage, and access to support large, streaming, and sensitive data

  • Integrate with data enclaves, cloud storage and computing, and local and global research clouds

  • Be compliant with new data regulations

  • Build incentives to integrate with journals and connect data  to literature, via curation, exploration, and replication tools

  • Ensure compliance with data citation recommendations to "make data count"


Thank You

Dataverse Community Meeting 2018

By Mercè Crosas

Dataverse Community Meeting 2018

Introduction to the 2018 Dataverse Community Meeting. https://projects.iq.harvard.edu/dcm2018/agenda

  • 914
Loading comments...

More from Mercè Crosas