Disclaimers

1. I have a bias towards the core tenets of Open Science as better scientific practices.

This is a new chapter

This is a new chapter

Take home #0

This is a take home message

2. Containers

4. Licences

3. Version Control Systems and Automation

1. Reproducibility, what now?

5. Reproducibility, why?

Terminology

Replicable, Robust, Reproducible, Generalisable

The Turing Way Community, & Scriberia, 2022 (Zenodo). Illustrations from The Turing Way (CC-BY 4.0)

Guaranteeing reproducibility is important for "reusable, transparent" research.

(?)

1. Reproducibility

We have a problem.

2016: Survey by Nature¹:

  • ~70% of researchers failed to reproduce other's results,
  • 50%+ failed to reproduce their own

1. Baker 2016 (Nature)

Really reproducible?

Same hardware, two Freesurfer builds (different glibc version)
Difference in estimated cortical tickness.¹

Same hardware, two Freesurfer builds (two glibc versions)

Difference in estimated parcellation.²

1. Glatard, et al., 2015 (Front. Neuroinform.)   2. Ali, et al., 2021 (Gigascience)

Same hardware, same FSL version, two glibc versions
Difference in estimated tissue segmentation.²

Really Robust?

Standard Operating Procedures

https://github.com/TheAxonLab/hcph-sops

Take home #1

Don't think that because your analysis "works", it's reproducible.

 

Make it reproducible,
share your SOPs,
and adopt better reporting standards! 

2. Containers

Containerisation

Docker vs Apptainer

Docker:

  • Targeting Laptops: better OS support (yes, you, mac/win peeps)
  • Hosts public hub to share built containers (DockerHub)
  • Docker images can be used as bases for Apptainer recipes

Apptainer:

  • Built for HPCs (Unix only), maintained by the Linux Foundation
  • Easier "recipe" syntax
  • Supports Docker images as bases

Containers in action

apptainer build --sandbox container.img docker://afni/afni_dev_base:AFNI_22.2.12

apptainer shell -f -e -w --no-home container.img


apptainer build container.sif recipe.def

apptainer exec -f -e --no-home -B /some/place:/tmp
               -B /some/place/elsewhere:/scripts \
               -B /another/place/:/data \
               container.sif /scripts/run_batch_analysis.sh sub-001 ses-01
apptainer exec docker://ghcr.io/apptainer/lolcow cowsay "Hello $USER!"

Try it now yourselves!

Practical #1

 

  1. Get the phys2cvr test data from OSF (next page)
  2. Set up an Ubuntu 24.04 container Apptainer recipe
  3. In that recipe, install python and pip (How? Google it! Don't forget you are installing in a Ubuntu system. And GOOGLE it, don't GEMINI it.)
  4. Via pip, install phys2cvr (How? Google it! phys2cvr has documentation!)
  5. Build the container
  6. Open an interactive session of that container (apptainer shell ...)
  7. Run phys2cvr in that container on the data you downloaded (How? Check the help with phys2cvr -h)

Practical #2: data

https://files.de-1.osf.io/v1/resources/mcr8g/providers/osfstorage/?zip=

STOP!

The next 2 slides show the recipe you should have written (plus a few more lines),
as well as the commands you should use to run the containers.
Do everything first, then compare with the solutions after.

Practical #2: recipe (should be similar to this)

Bootstrap: docker
From: ubuntu:22.04

%environment
export DEBIAN_FRONTEND=noninteractive
export TZ=Europe/Brussels

%post
# Set install variables, create tmp folder
export TMPDIR="/tmp/general_preproc_build_$( date -u +"%F_%H-%M-%S" )"
[[ -d ${TMPDIR} ]] && rm -rf ${TMPDIR}
mkdir -p ${TMPDIR}
export DEBIAN_FRONTEND=noninteractive
export TZ=Europe/Brussels
apt update -qq
apt install -y -q --no-install-recommends ca-certificates dirmngr gnupg lsb-release wget
apt install -y -q --no-install-recommends python3-distutils python3-pip python-is-python3

# Install PYTHON things.
pip3 install pip==25.0 setuptools==70.3.0 wheel==0.37.1

# Install datalad, fsleyes, nilearn, peakdet, phys2cvr.
pip3 install phys2cvr==0.18.6

# Final removal of lists and cleanup
cd /tmp || exit 1
rm -rf ${TMPDIR}
rm -rf /var/lib/apt/lists/*

Practical #2: run the containers

apptainer build -f p2c.sif recipe_p2c.def

apptainer shell -f -e --no-home -B ~/mydata:/data p2c.sif

	cd /data
	phys2cvr -i func.nii.gz -o results -m mask.nii.gz -r roi.nii.gz -co2 co2.phys -dmat motpar.par motderiv.par
	

apptainer exec -f -e --no-home -B /some/place:/tmp
               -B /some/place/elsewhere:/scripts \
               -B /another/place/:/data \
               p2c.sif phys2cvr -i /data/func.nii.gz -o /data/results \
               -m /data/mask.nii.gz -r /data/roi.nii.gz -co2 /data/co2.phys \
               -dmat /data/motpar.par /data/motderiv.par

Take home #2

Make sure what you do today
is
what you'll do tomorrow.

Use containers!

3. Version Control Systems

Does any of these situations look familiar?

I can't work on that project now because my colleague/friend/dog is working on [a different part than what I'd modify of] it at the moment...

Version Control Systems (VCS)

File history & parallel development

Attribution

Automation pt. 1: git hooks

pip install pre-commit  # Install via pip, or
# Comes installed with development extras
pip install -e /path/to/phys2cvr[dev]

cd /path/to/phys2cvr
pre-commit init
pre-commit run

(Local and remote) simple automations, e.g:

  • Code style
  • File checks (empty lines, indent, executables)
  • Language and typos (!!!)

Other Version Control Systems

VCS for data

Take home #3

Version Control Systems are everywhere,
(for good reasons, including increasing trust)

Use them!

Content

Aggregation/delivery

VCS hosting services

A classic git(Hub) flow

Create branch "dev"

Commit

Merge dev into main

Diverging main: conflict?

Merge main into dev

Initialise repository
"Main" branch

Main

Dev

Bug

A classic git(Hub) flow

Create branch "dev"

Commit

Merge dev into main

Diverging main: conflict?

Merge main into dev

Initialise repository
"Main" branch

Fork ("upstream" vs "origin")

Pull from upstream

Merge origin/main into dev

Clone (local repository)

Pull Request

Pull from *

Push to *

Main

Dev

Upstream

Main

Origin

Dev

Main

(local)

Pull requests and Reviews

Some suggestions for...

  • Keep you contribution small and focused
  • Make your contribution as clear as possible
  • Use a review as a learning experience
  • Be patient: reviewers might ask you some more work than you expected, but it's always to improve your work.
  • Be kind and patient
  • Don't limit your review to the apparent changes - depending on the importance of the review, take the time to look at how the whole project might change.
  • Keep your review to what's necessary for the contribution - if it would be nice to ..., open an issue (or think about making the contribution yourself).

... Authors

... Reviewers

  • Style & basic checks: pre-commit
  • Tests: CircleCI & Codecov
  • Versions and Releases: Auto and Github
  • DOIs (citable objects): Zenodo
  • Documentation: Read the Docs

Automation at work

Let's not reinvent the wheel

Take advantage of the marketplace: there is a very high probability that what you are looking for is already available.

Take home #4

Working with VCS allows you to:

  1. track changes and authors
  2. work in parallel without disruptions
  3. increase attribution and responsibility
  4. access to automations

Bonus: it can force a team to double check projects!

4. Licenses

Disclaimer:

I am not a legal expert.
If you ever have any doubts, contact the Technology Transfer Office.

License your work

A work that is not licensed is not public (paradox!)

There are many (open source) licences to pick up from, not only code-related.

www.choosealicense.org

The licence should be in the first commit you make.

Personal picks for science: Apache 2.0 and CC-BY-ND-4.0
(consider L-GPLv3.0 and CC-BY-4.0 too)

Stefano Moia, 2026

You can add a clause against LLM use!

Understand licensing and ownership

  • Check the licence of code, data, and libraries you are "borrowing".
  • Pay attention to single vs double licensing (e.g. academic vs commercial).
  • Check licence compatibility.
  • Remember that institutions might have rights to what their employees do:





     
  • However, they can also help you with licensing and license enforcement.

The data resulting from the research, as well as the publication rights are owned by the Host Institute [UM], unless otherwise stated below in the section “Additional arrangements”.

Licence compatibility

© Sebastien Adams, I WANT TO DISTRIBUTE MY SOFTWARE DEVELOPMENTS. HOW TO DEFINE AN OPEN LICENSING STRATEGY?
©
Benjamin Jean (2011), Option libre. Du bon usage des licences libres.

License your work in the right way

  • Put a copy of the licence or a link to it as close as possible to "borrowed" material, if not in it.
  • If any license requires its adoption for derivatives (e.g. GPL), you must licence your work with the same licence.
  • You can ask the original authors to change their licence (e.g. GPL to L-GPL) or give you special permissions.
  • Remember to add licences disclaimers in all of your files.
[...]

if __name__ == "__main__":
    _main(sys.argv[1:])


"""
Copyright 2022, Stefano Moia & EPFL.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
"""

License your work in the right way

[...]

License your work in the right way

MATLAB users:

  • If you include external functions/scripts/libraries, your work is considered a derivative. Report licence, authors, and origin of the code inside them and respect their licence.
  • Alternatively, don't include anything but state requirements / create install scripts.
  • If you are releasing a build, the build is considered a derivative.

Python users:

  • If you copy-paste code, your work is a derivative.
  • Imports are trickier:
    • Technically, GPL or © licences triggers on import.
    • Practically, it's a really grey area. Make those imports optional, and specify their licences as clearly as possible.

Take home #5

Licensing is as complicated as it is important.

Double check licenses of borrowed material, report them in your own work
for licence tracking.

5. Why?

The risks
of non-replicable science

  • Erratum
     
  • Retraction
     
  • Misinformation
     
  • Public trust
     
  • Impact
  • Erratum
     
  • Retraction

We are retracting this article due to concerns with Figure 5. In Figure 5A, there is a concern that the first and second lanes of the HIF-2α panel show the same data, [...], despite all being labeled as unique data. [...] We believe that the overall conclusions of the paper remain valid, but we are retracting the work due to these underlying concerns about the figure. Confirmatory experimentation has now been performed and the results can be found in a preprint article posted on bioRxiv [...]

Last take home message:

What you do in your scientific work has an impact on society.

It's not about you.

Remember that.

Any question [/opinions/objections/...]?

Take home messages

Stefano Moia, 2026

Find the presentation at:

slides.com/smoia/
reprouob/scroll

  1. Ensure reproducibility, report transparently, share SOPs
  2. Use containers to ensure long term reproducibility, robustness, and generalisation
  3. Use VCSs: they're there for you, and they can improve trust!
  4. VCSs increase attribution and improve collborative and parallel development
  5. License properly!

Oh, and don't forget!

Oh, and don't forget!