Reproducible Computational Science:
Challenges and opportunities for research and IT
Panel at Duke TechExpo 2015
April 17, 2015
Moderator: Hilmar Lapp
Center for Genomic and Computational Biology (GCB)
You can find (and copy, edit, ...) the slides online:
All material for this panel is online (to be copied, edited, ...)

The Reproducibility Crisis
- Only 6 of 56 landmark oncology papers confirmed
- 43 of 67 drug target validation studies failed to reproduce
- Effect size overestimation is common

Computional research:
Availability and technical challenges

Collberg et al (2015) Repeatability and Benefaction in Computer Systems Research - A Study and a Modest Proposal.
Reproducing reproducible computational science:
an experiment
-
Software with many dependencies -> exponentially lower probability that all install
-
Holes or errors in documentation -> harmless for experts, often fatal for "method novice"
-
Software evolution & rot -> parameters that worked 1 year ago now throw an error
-
Dependency hell: baseline software and packages differ depending on who is trying to reproduce
Good - Better - Best

Peng, R. D. “Reproducible Research in Computational Science” Science 334, no. 6060 (2011): 1226–1227
Lessons re: End-to-end reproducibility
Any work you do to make your analysis more reproducible pays dividends for colleagues and your future self.
Jeremy Leipzig
A bewildering tech soup
- Version control
- Distributed version control
- Git, Mercurial, Subversion
- Provenance
- SHA256
- Docker
- Docker Hub
- Container tagging
- Drone, Travis, Circle CI
- VM memory, storage limits
- Literate programming
- Markdown
- RMarkdown
- Knitr
- packrat
- HIPAA, protected data
- Firewalls
- DataCite DOIs
- Zenodo, Figshare
- Dryad
A huge opportunity for Research Informatics to accelerate science
Panelists
- Hilmar Lapp, Dan Leehr (Center for Genomic and Computational Biology)
- Mine Çetinkaya-Rundel (Department of Statistical Science)
- Karen Cranston (National Evolutionary Synthesis Center - NESCent)
- Mark Delong (OIT, Research Computing)
- Erich Huang (Div. of Translational Bioinformatics, Department of Biostatistics & Bioinformatics)
- Darin London (OIT, Office of Research Informatics)
Intro Talks - 5 minutes each
- Erich Huang: Provenance and metadata APIs enabling reproducible research data
- Karen Cranston: Reproducible Science Curriculum - how to make computational research more reproducible for the rest of us
Duke TechExpo2015 - Reproducible Computational Science Panel
By Hilmar Lapp
Duke TechExpo2015 - Reproducible Computational Science Panel
- 727