Open Science

Hurdles, gains, and opportunities
of a modern approach to make science.

smoia
@SteMoia
s.moia.research@gmail.com

Tainan, 03.12.24

Stefano Moia, 2024

Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, The Netherlands;    Open Science Special Interest Group (OHBM);    physiopy (https://github.com/physiopy)

Disclaimers

1. I have a bias towards the core tenets of Open Science as better scientific practices.

2. I am biased by my own experience as a neuroscientist in a W.E.I.R.D. country.

0. Rules & Materials

This is a new chapter

This is a new chapter

Take home #0

This is a take home message

Terminology

Replicable, Robust, Reproducible, Generalisable

The Turing Way Community, & Scriberia, 2022 (Zenodo). Illustrations from The Turing Way (CC-BY 4.0)

Guaranteeing reproducibility is important for "reusable, transparent" research.

1. The Origin Story

The Origin Story

1942-1998: first concepts of "Open Science"

2010s: current concepts of OS, responding to:

  • Failed attempts to reproduce core concepts of social psychology and biomedical research
  • Studies on questionable research practices

2016: Survey by Nature¹: ~70% of researchers failed to reproduce other's results, 50%+ failed to reproduce their own

1. Baker 2016 (Nature)

Reproducible?

Same hardware, two Freesurfer builds (different glibc version)
Difference in estimated cortical tickness.¹

Same hardware, same FSL version, two glibc versions
Difference in estimated tissue segmentation.²

Same hardware, two Freesurfer builds (two glibc versions)

Difference in estimated parcellation.²

1. Glatard, et al., 2015 (Front. Neuroinform.)   2. Ali, et al., 2021 (Gigascience)

Reproducible?

Reproducible?

Aarts et al. 2015 (Science)

What does failure to replicate
and generalise results tell us
about hypotheses and scientific facts?

(Some) Issues

  • Human mistakes / bugs:    Objective is not not-human
     
  • Different (algorithmic) implementations:    Researchers degrees of freedom
     
  • Unavailable data / code / materials / information / procedures
     
  • Novelty seeking and Publish or perish culture of academia
     
  • Null results rejection → Bias toward positive results
     
  • Bad "habits": P-hacking, data dredging, data fishing, HARKing, ..., fraud

Take home #1

An irreproducible finding is
a waste of time.

Be honest about it, and take preventive measures
to improve the reproducibility
of your scientific work.

What is Open Science?

To be Open, Science should be publicly available, reusable,
and transparent¹. It should aim at reaching:

1. The Turing Way Community, 2019 (Zenodo). https://the-turing-way.netlify.app (CC-BY 4.0)

  • Open Data
  • Open Source Software
  • Open Hardware
  • Open Access (outreach)
  • Reproducible results
  • Re-usable artefacts
    (setting independent)
  • Complete process documentation

Newer points of view might add:

  • Two-stages submissions (preregistrations and registered reports)
  • Open and inclusive culture (environment, research)
  • Collaborative science

Why committing to Open Science?

OS makes academic research available to its real funding bodies: citizens, policy-makers, industry.

Well made OS guarantees more certain results and reduces the impact of human error.

It also reduces publication bias¹ ².

Ethics

1. Allen & Mehler, 2019 (PLoS Biol)    2. Scheel, Schijen, & Lakens, 2021 (AMPPS)

Organization & time management

It reduces the impact of laboratories' generational changes

Why committing to Open Science?

OS imposes practices for collaboration and openness that improve our procedures:

  • Traceability of changes
  • Procedural documentation
  • Reusability of artefacts
  • External revision

Your future self will be happier to collaborate with you!

The demand from policy makers is increasing¹.

You (will) have to do it.

1. McKiernan, et al., 2016 (eLife)

Why committing to Open Science?

Practices

How do we do Open Science?

  1. Read literature, formulate hypotheses, identify your biases, formulate SOPs, choose methods.
  2. Find open development projects, contribute if needed, check licences.
  3. Choose a (permissive) licence for the artefacts, start open developing code (share it!), use VCS, test and document code, create releases.
  4. Submit a Registered Report (~ Introduction, hypotheses, methods, procedures, biases), get through first peer review round.
  5. Create containers.
  6. Collect data, curate them using community schemas (e.g. BIDS), upload them on public databases with embargo ("private time").
  7. Run analyses in containers, interpret results, write the second part of your Registered Report (~ Results, a posteriori analyses, discussion, conclusions)
  8. Publish open access.
  9. Remove embargoes.
  10. Rinse and repeat.

2. Data Standards & metadata

Data standards & metadata

Data standards
& metadata

1. Gorgolewski, et al., 2016 (Scientific Data)
2. Zwiers, Moia, Oostenweld, 2022, (Front. Neuroinf.)

Plan in advance

Take home #2

Adopt data standards
and add metadata
to improve reusability
(and shareability)!
It requires initial planning,
but it simplifies data analysis
in the long term.

3. Version Control Systems

Does any of these situations look familiar?

I can't work on that project now because my colleague/friend/dog is working on [a different part than what I'd modify of] it at the moment...

Version Control Systems (VCS)

Version Control Systems (VCS)

VCS for data

File history & parallel development

Attribution

Automation pt. 1: git hooks

pip install pre-commit  # Install via pip, or
# Comes installed with development extras
pip install -e /path/to/phys2cvr[dev]

cd /path/to/phys2cvr
pre-commit init
pre-commit run

(Local and remote) simple automations, e.g:

  • Code style
  • File checks (empty lines, indent, executables)
  • Language and typos (!!!)

4. Public Engagement

Content

Aggregation/delivery

VCS hosting services

A classic git(Hub) flow

Create branch "dev"

Commit

Merge dev into main

Diverging main: conflict?

Merge main into dev

Initialise repository
"Main" branch

Main

Dev

Bug

A classic git(Hub) flow

Create branch "dev"

Commit

Merge dev into main

Diverging main: conflict?

Merge main into dev

Initialise repository
"Main" branch

Fork ("upstream" vs "origin")

Pull from upstream

Merge origin/main into dev

Clone (local repository)

Pull Request

Pull from *

Push to *

Main

Dev

Upstream

Main

Origin

Dev

Main

(local)

Assign versions and release

Releases make your work easier to retrieve (and cite).

Imagine them as hard links to a certain moment in time.
(e.g. paper #1 vs paper #2).

Make your project identifiable (and citable)

Make your project identifiable (and citable)

{
    "license": "Apache-2.0", 
    "title": "physiopy/phys2bids: BIDS formatting of physiological recordings",
    "upload_type": "software",
    "creators": [
      [...]
        {
            "orcid": "0000-0002-7796-8795",
            "affiliation": "Florida International University", 
            "name": "Katie Bottenhorn"
        }, 
      [...]
    ], 
    "access_right": "open"
}
  • Style & basic checks: pre-commit
  • Tests: CircleCI & Codecov
  • Versions and Releases: Auto and Github
  • DOIs (citable objects): Zenodo
  • Documentation: Read the Docs

Automation at work

Take home #3

Working with VCS allows you to:

  1. track changes and authors
  2. work in parallel without disruptions
  3. engage the public more easily
  4. access to automations

Bonus: it can force a team to double check projects!

5. Licensing

License your work

A work that is not licensed is not public (paradox!)

There are n+1 (open source) licences to pick up from.

www.choosealicense.org

The licence should be the first commit you make in a project.

Personal picks for science:
Apache 2.0 and CC-BY-ND-4.0
(consider L-GPLv3.0, and CC-BY-4.0 too)

License your work in the right way

  • Put a copy of the licence or a link to it as close as possible to "borrowed" material, if not in it.
  • If any license requires its adoption for derivatives (e.g. GPL), you must licence your work with the same licence.
  • You can ask the original authors to change their licence (e.g. GPL to L-GPL) or give you special permissions.
  • Remember to add licences disclaimers in all of your files.
[...]

if __name__ == "__main__":
    _main(sys.argv[1:])


"""
Copyright 2022, Stefano Moia & EPFL.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
"""

Licence compatibility

© Sebastien Adams, I WANT TO DISTRIBUTE MY SOFTWARE DEVELOPMENTS. HOW TO DEFINE AN OPEN LICENSING STRATEGY?
©
Benjamin Jean (2011), Option libre. Du bon usage des licences libres.

License your work in the right way

MATLAB users:

  • If you include external functions/scripts/libraries, your work is considered a derivative. Report licence, authors, and origin of the code inside them and respect their licence.
  • Alternatively, don't include anything but state requirements / create install scripts.
  • If you are releasing a build, the build is considered a derivative.

Python users:

  • If you copy-paste code, your work is a derivative.
  • Imports are trickier:
    • Technically, GPL or © licences triggers on import.
    • Practically, it's a really grey area. Make those imports optional, and specify their licences as clearly as possible.

Take home #4

Licensing is as complicated as it is important.

Double check licenses of borrowed material, report them in your own work
for licence tracking.

6. SOPs and Containers

Standard Operating Procedures

https://github.com/TheAxonLab/hcph-sops

Containers

Bootstrap: docker
From: python:3.8.13-slim-buster

%environment
export DEBIAN_FRONTEND=noninteractive
export TZ=Europe/Brussels

%post
# Set install variables
export DEBIAN_FRONTEND=noninteractive
export TZ=Europe/Brussels
# Prepare repos and install dependencies
pip3 install nigsp[all]
# Final removal of lists and cleanup
rm -rf /var/lib/apt/lists/*

Docker vs Apptainer

Docker:

  • Targeting Laptops: better OS support
  • Offers public hub to share built containers
  • Docker containers can be built in Singularity

Apptainer:

  • Built for HPCs (Unix only), maintained by the Linux Foundation
  • Easier syntax
  • Supports Docker containers

Easy containers

Easy containers

BIDSapps: containers for BIDS pipelines

1. Gorgolewski, et al., 2017 (PLoS Comp. Biol.)

Take home #5

Adopt SOPs and containers
to guarantee reusability.

7. Registered Reports

Registered Reports

Images courtesy of Oscar Esteban (CC-BY-4.0)

Ethics committee,
internal project assessment, ...

Re-learn scientific process

Currently not adapted to short projects

Guarantees a publication!

Pre-registrations & Registered reports

Pre-registration

  • Upload of hypotheses (and methods/protocols) on a public server (e.g. Open Science Framework)
  • Embargoed until paper publication
  • No peer review
  • No certainty of publication
     
  • Weaker version of open publishing
  • Submission in two stages of a manuscript in a journal
     
  • Public once accepted
  • Two-stage peer review
  • Higher certainty of publication (depending on journal)
  • Stronger version of open publishing

Registered report

Registered Reports

Take home #6

The next time you start a (long) project, consider registered reports.

Last take home message:

What you do in your scientific work has an impact on society.

Scientific outcome is not about you.

Open science can help you with that.

Thanks to...

That's all folks!

smoia
@SteMoia
s.moia.research@gmail.com

...you for the (sustained) attention!

...MR-Methods @UM, Physiopy, OSSIG

...the organisers, for having me here!

Stefano Moia, 2024

Find the presentation at:

slides.com/ephraim24/open-science-ncku-2024/scroll

1. Be honest about generalisation of results, take preventive measures to improve it.

2. Adopt data standards and add metadata to improve reusability.

3. Working with VCS allows you to track changes, work in parallel, and implement automation.

4. Licenses!

5. Adopt Standard Operative Procedures and containers to guarantee reusability and reproducibility.

6. Consider Registered Reports in your next long project.

Take home messages

Any question [/opinions/objections/...]?

Stefano Moia, 2024

Find the presentation at:

slides.com/ephraim24/open-science-ncku-2024/scroll

Oh, and don't forget!

Oh, and don't forget!