Neha Moopen
Research Data Manager
2023-09-06
tinyurl.com/repco-gsns
Best Practices in Writing Reproducible Code
@UtrechtUniversity
@NEONScience
&
Computational reproducibility is when detailed information is provided about code, software, hardware and implementation details (Victoria Stodden, 2014).
This image was created by Scriberia for The Turing Way community and is used under a CC-BY licence. The image was obtained from https://zenodo.org/record/3332808.
A study may be more or less reproducible than another depending on what data and code are made available (Peng, 2011).
The stakeholders involved:
If you need more convincing, see also: Five selfish reasons to work reproducibly (Florian Markowetz, 2015)
How confident do you feel?
We need to do more: we need to inspire trust.
ORGANIZATION
DOCUMENTATION
AUTOMATION
DISSEMINATION
ORGANIZATION
DOCUMENTATION
AUTOMATION
DISSEMINATION
HOW CAN YOU DO IT?
FOLDER STRUCTURE
NAMING CONVENTION
VERSION CONTROL
FOLDER STRUCTURE
Contain your project in a single recognizable folder.
Distinguish folder types, name them accordingly:
Source: Wilson et al. (2017)
Machine-readable and Human-readable
names ->
<- Names that support sorting
source: OSF's File naming Guide
NAMING CONVENTION
File organization should:
source: Intro to Reproducible Science @NEONScience
File renaming tools:
Bulk Rename Utility (for Windows)
Renamer (for MacOS)
PSRenamer (for MacOS, Windows, Unix, Linux)
WildRename (for Windows)
VERSION CONTROL
Git is a Version Control Software (VSC) that lives on your computer like any other program.
GitHub is a cloud-based hosting service that helps you manage Git repositories.
WHY DO YOU NEED GIT & GITHUB?
source: phdcomics.com
WHY DO YOU NEED GIT & GITHUB?
source: Turing Way
WHY DO YOU NEED GIT & GITHUB?
source: Turing Way
DO: Commits should be atomic: comprehensive 'units' of changes.
DON'T: edit for a full day and put this in a single commit (or worse: forget to...)
Commits should have informative messages so you (and others) can trace your steps
Track most files; .gitignore those files you don't.
Explore new ideas with branches, keep a stable version on master
HOW TO GIT: LAST TIPS
source: https://xkcd.com/1296/
Are there changes you can make in your file & folder naming/organization?
Has anyone use Git/GitHub to version control, share, find code?
ORGANIZATION
DOCUMENTATION
AUTOMATION
DISSEMINATION
You want yourself to understand how code written some time ago works
You want others to understand how to (re-)use your code
COMMENTS
READMEs
Comments are annotations you write directly in the code source.
They:
are written for anyone who deals with your source code (including yourself)
explain parts that are not intuitive from the code itself
do not replace readable or structured code
can be used to directly generate documentation for users (if in a specific structure).
Comic source: Geek & Poke
Best practices for commenting:
The README page is the first thing your user will see!
The contents typically include one or more of the following:
Reference: Wikipedia's README page
An example README:
Check out Make a README:
Is your code well-annotated? Would future you, a direct colleague, an external colleague/peer understand it?
Do you have README files already?
ORGANIZATION
DOCUMENTATION
AUTOMATION
DISSEMINATION
HOW DO YOU DO IT?
SCRIPTING
DRY
FUNCTIONALIZE
Everything should be done with a script. Pointing-and-clicking and/or copying-and-pasting, are not reproducible.
Think about:
- obtaining raw data
- converting data files
- editing data
- data cleaning
- data analyses
source: Best Practices in Writing Reproducible Code @UtrechtUniversity
DON'T REPEAT YOURSELF (DRY) AND...
Functions are smaller code units reponsible of one task.
Functions are meant to be reused
Functions accept arguments (though they may also be empty!)
What arguments a function accept is defined by its parameters
Functions do not necessarily make code shorter (at first)! Compare:
source: Best Practices in Writing Reproducible Code @UtrechtUniversity
It's better to think in building blocks:
source: Best Practices in Writing Reproducible Code @UtrechtUniversity
Among the many solutions you can utilize to avoid repeating yourself, you can consider implementing:
- a script for each 'step'
- use GNU Make to automate the workflow/build for your project
- write a reproducible report using R Markdown, Jupyter, Quarto
Do you have a tendency to rewrite or repeat code?
Do you see opportunities to write more functions?
ORGANIZATION
DOCUMENTATION
AUTOMATION
DISSEMINATION
WHY DO YOU NEED IT?
HOW DO YOU DO IT?
GITHUB x ZENODO
LICENSES
CITATION
SOFTWARE LICENSES
Copyright is implicit; others cannot use your code without your permission.
Licensing gives that permission, and its boundaries and conditions.
Choosing a license early on means being aware of your license as the project proceeds (and not creating conflicts).
There are over 80 OSI-approved licenses (and many, many others) to choose from. This is one that's often used:
Archive your project on Zenodo, and get a DOI!
GitHub & Zenodo have a great integration that makes it easy to archive a whole repository.
First, select your repository:
source: Best Practices in Writing Reproducible Code @UtrechtUniversity
Second, release your project and follow the workflow:
source: Best Practices in Writing Reproducible Code @UtrechtUniversity
Last,
source: Best Practices in Writing Reproducible Code @UtrechtUniversity
As a final touch, take your DOI and place it as a badge in your GitHub README!
Links: GitHub / Workshop Materials / Zenodo
How to cite code?
Links: GitHub / Workshop Materials / Zenodo
How to cite code?
Links: GitHub / Workshop Materials / Zenodo
In addition to README.md & LICENSE.md, you can also include CITATION.cff in your GitHub repo.
Zenodo will read the CITATION.cff if available.
Learn more about CITATION.cff:
* Official Site:
Use Binder to make your code immediately reproducible by anyone, anywhere!
How does it work? Binder is a virtual, executable environment that runs the code in your GitHub repository.
When do you think is the best time to publish your code / place it online? Why?
Have you already published your code? Where?
What software license did/would you choose? Why?
ORGANIZATION
DOCUMENTATION
AUTOMATION
DISSEMINATION
You get more efficient, less redundant science: others can build upon our work!
This image was created by Scriberia for The Turing Way community and is used under a CC-BY licence. The image was obtained from https://zenodo.org/record/3332808.