Open Science
Hurdles, gains, and opportunities
of a modern approach to make science.
smoia | |
@SteMoia | |
s.moia.research@gmail.com |
Tainan, 03.12.24
Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, The Netherlands; Open Science Special Interest Group (OHBM); physiopy (https://github.com/physiopy)
Disclaimers
1. I have a bias towards the core tenets of Open Science as better scientific practices.
2. I am biased by my own experience as a neuroscientist in a W.E.I.R.D. country.
0. Rules & Materials
This is a new chapter
This is a new chapter
Take home #0
This is a take home message
Terminology
Replicable, Robust, Reproducible, Generalisable
The Turing Way Community, & Scriberia, 2022 (Zenodo). Illustrations from The Turing Way (CC-BY 4.0)
Guaranteeing reproducibility is important for "reusable, transparent" research.
1. The Origin Story
The Origin Story
1942-1998: first concepts of "Open Science"
2010s: current concepts of OS, responding to:
- Failed attempts to reproduce core concepts of social psychology and biomedical research
- Studies on questionable research practices
2016: Survey by Nature¹: ~70% of researchers failed to reproduce other's results, 50%+ failed to reproduce their own
1. Baker 2016 (Nature)
Reproducible?
Same hardware, two Freesurfer builds (different glibc version)
Difference in estimated cortical tickness.¹
Same hardware, same FSL version, two glibc versions
Difference in estimated tissue segmentation.²
Same hardware, two Freesurfer builds (two glibc versions)
Difference in estimated parcellation.²
1. Glatard, et al., 2015 (Front. Neuroinform.) 2. Ali, et al., 2021 (Gigascience)
Reproducible?
Reproducible?
Aarts et al. 2015 (Science)
What does failure to replicate
and generalise results tell us
about hypotheses and scientific facts?
(Some) Issues
- Human mistakes / bugs: Objective is not not-human
- Different (algorithmic) implementations: Researchers degrees of freedom
- Unavailable data / code / materials / information / procedures
-
Novelty seeking and Publish or perish culture of academia
-
Null results rejection → Bias toward positive results
- Bad "habits": P-hacking, data dredging, data fishing, HARKing, ..., fraud
Take home #1
An irreproducible finding is
a waste of time.
Be honest about it, and take preventive measures
to improve the reproducibility
of your scientific work.
What is Open Science?
To be Open, Science should be publicly available, reusable,
and transparent¹. It should aim at reaching:
1. The Turing Way Community, 2019 (Zenodo). https://the-turing-way.netlify.app (CC-BY 4.0)
- Open Data
- Open Source Software
- Open Hardware
- Open Access (outreach)
- Reproducible results
- Re-usable artefacts
(setting independent) - Complete process documentation
Newer points of view might add:
- Two-stages submissions (preregistrations and registered reports)
- Open and inclusive culture (environment, research)
- Collaborative science
Why committing to Open Science?
OS makes academic research available to its real funding bodies: citizens, policy-makers, industry.
Well made OS guarantees more certain results and reduces the impact of human error.
It also reduces publication bias¹ ².
Ethics
1. Allen & Mehler, 2019 (PLoS Biol) 2. Scheel, Schijen, & Lakens, 2021 (AMPPS)
Organization & time management
It reduces the impact of laboratories' generational changes
Why committing to Open Science?
OS imposes practices for collaboration and openness that improve our procedures:
- Traceability of changes
- Procedural documentation
- Reusability of artefacts
- External revision
Your future self will be happier to collaborate with you!
The demand from policy makers is increasing¹.
You (will) have to do it.
1. McKiernan, et al., 2016 (eLife)
Why committing to Open Science?
Practices
How do we do Open Science?
- Read literature, formulate hypotheses, identify your biases, formulate SOPs, choose methods.
- Find open development projects, contribute if needed, check licences.
- Choose a (permissive) licence for the artefacts, start open developing code (share it!), use VCS, test and document code, create releases.
- Submit a Registered Report (~ Introduction, hypotheses, methods, procedures, biases), get through first peer review round.
- Create containers.
- Collect data, curate them using community schemas (e.g. BIDS), upload them on public databases with embargo ("private time").
- Run analyses in containers, interpret results, write the second part of your Registered Report (~ Results, a posteriori analyses, discussion, conclusions)
- Publish open access.
- Remove embargoes.
- Rinse and repeat.
2. Data Standards & metadata
Data standards & metadata
Data standards
& metadata
1. Gorgolewski, et al., 2016 (Scientific Data)
2. Zwiers, Moia, Oostenweld, 2022, (Front. Neuroinf.)
Plan in advance
Take home #2
Adopt data standards
and add metadata
to improve reusability
(and shareability)!
It requires initial planning,
but it simplifies data analysis
in the long term.
3. Version Control Systems
Does any of these situations look familiar?
I can't work on that project now because my colleague/friend/dog is working on [a different part than what I'd modify of] it at the moment...
Version Control Systems (VCS)
Version Control Systems (VCS)
VCS for data
File history & parallel development
Attribution
Automation pt. 1: git hooks
pip install pre-commit # Install via pip, or
# Comes installed with development extras
pip install -e /path/to/phys2cvr[dev]
cd /path/to/phys2cvr
pre-commit init
pre-commit run
(Local and remote) simple automations, e.g:
- Code style
- File checks (empty lines, indent, executables)
- Language and typos (!!!)
4. Public Engagement
Content
Aggregation/delivery
VCS hosting services
A classic git(Hub) flow
Create branch "dev"
Commit
Merge dev into main
Diverging main: conflict?
Merge main into dev
Initialise repository
"Main" branch
Main
Dev
Bug
A classic git(Hub) flow
Create branch "dev"
Commit
Merge dev into main
Diverging main: conflict?
Merge main into dev
Initialise repository
"Main" branch
Fork ("upstream" vs "origin")
Pull from upstream
Merge origin/main into dev
Clone (local repository)
Pull Request
Pull from *
Push to *
Main
Dev
Upstream
Main
Origin
Dev
Main
(local)
Assign versions and release
Releases make your work easier to retrieve (and cite).
Imagine them as hard links to a certain moment in time.
(e.g. paper #1 vs paper #2).
Make your project identifiable (and citable)
Make your project identifiable (and citable)
{
"license": "Apache-2.0",
"title": "physiopy/phys2bids: BIDS formatting of physiological recordings",
"upload_type": "software",
"creators": [
[...]
{
"orcid": "0000-0002-7796-8795",
"affiliation": "Florida International University",
"name": "Katie Bottenhorn"
},
[...]
],
"access_right": "open"
}
- Style & basic checks: pre-commit
- Tests: CircleCI & Codecov
- Versions and Releases: Auto and Github
- DOIs (citable objects): Zenodo
- Documentation: Read the Docs
Automation at work
Take home #3
Working with VCS allows you to:
- track changes and authors
- work in parallel without disruptions
- engage the public more easily
- access to automations
Bonus: it can force a team to double check projects!
5. Licensing
License your work
A work that is not licensed is not public (paradox!)
There are n+1 (open source) licences to pick up from.
www.choosealicense.org
The licence should be the first commit you make in a project.
Personal picks for science:
Apache 2.0 and CC-BY-ND-4.0
(consider L-GPLv3.0, and CC-BY-4.0 too)
License your work in the right way
- Put a copy of the licence or a link to it as close as possible to "borrowed" material, if not in it.
- If any license requires its adoption for derivatives (e.g. GPL), you must licence your work with the same licence.
- You can ask the original authors to change their licence (e.g. GPL to L-GPL) or give you special permissions.
- Remember to add licences disclaimers in all of your files.
[...]
if __name__ == "__main__":
_main(sys.argv[1:])
"""
Copyright 2022, Stefano Moia & EPFL.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
"""
Licence compatibility
© Sebastien Adams, I WANT TO DISTRIBUTE MY SOFTWARE DEVELOPMENTS. HOW TO DEFINE AN OPEN LICENSING STRATEGY?
© Benjamin Jean (2011), Option libre. Du bon usage des licences libres.
License your work in the right way
MATLAB users:
- If you include external functions/scripts/libraries, your work is considered a derivative. Report licence, authors, and origin of the code inside them and respect their licence.
- Alternatively, don't include anything but state requirements / create install scripts.
- If you are releasing a build, the build is considered a derivative.
Python users:
- If you copy-paste code, your work is a derivative.
- Imports are trickier:
- Technically, GPL or © licences triggers on import.
- Practically, it's a really grey area. Make those imports optional, and specify their licences as clearly as possible.
Take home #4
Licensing is as complicated as it is important.
Double check licenses of borrowed material, report them in your own work
for licence tracking.
6. SOPs and Containers
Standard Operating Procedures
https://github.com/TheAxonLab/hcph-sops
Containers
Bootstrap: docker
From: python:3.8.13-slim-buster
%environment
export DEBIAN_FRONTEND=noninteractive
export TZ=Europe/Brussels
%post
# Set install variables
export DEBIAN_FRONTEND=noninteractive
export TZ=Europe/Brussels
# Prepare repos and install dependencies
pip3 install nigsp[all]
# Final removal of lists and cleanup
rm -rf /var/lib/apt/lists/*
Docker vs Apptainer
Docker:
- Targeting Laptops: better OS support
- Offers public hub to share built containers
- Docker containers can be built in Singularity
Apptainer:
- Built for HPCs (Unix only), maintained by the Linux Foundation
- Easier syntax
- Supports Docker containers
Easy containers
Easy containers
BIDSapps: containers for BIDS pipelines
1. Gorgolewski, et al., 2017 (PLoS Comp. Biol.)
Take home #5
Adopt SOPs and containers
to guarantee reusability.
7. Registered Reports
Registered Reports
Images courtesy of Oscar Esteban (CC-BY-4.0)
Ethics committee,
internal project assessment, ...
Re-learn scientific process
Currently not adapted to short projects
Guarantees a publication!
Pre-registrations & Registered reports
Pre-registration
- Upload of hypotheses (and methods/protocols) on a public server (e.g. Open Science Framework)
- Embargoed until paper publication
- No peer review
- No certainty of publication
- Weaker version of open publishing
- Submission in two stages of a manuscript in a journal
- Public once accepted
- Two-stage peer review
- Higher certainty of publication (depending on journal)
- Stronger version of open publishing
Registered report
Registered Reports
Take home #6
The next time you start a (long) project, consider registered reports.
Last take home message:
What you do in your scientific work has an impact on society.
Scientific outcome is not about you.
Open science can help you with that.
Thanks to...
That's all folks!
smoia | |
@SteMoia | |
s.moia.research@gmail.com |
...you for the (sustained) attention!
...MR-Methods @UM, Physiopy, OSSIG
...the organisers, for having me here!
Find the presentation at:
slides.com/ephraim24/open-science-ncku-2024/scroll
1. Be honest about generalisation of results, take preventive measures to improve it.
2. Adopt data standards and add metadata to improve reusability.
3. Working with VCS allows you to track changes, work in parallel, and implement automation.
4. Licenses!
5. Adopt Standard Operative Procedures and containers to guarantee reusability and reproducibility.
6. Consider Registered Reports in your next long project.
Take home messages
Any question [/opinions/objections/...]?
Find the presentation at:
slides.com/ephraim24/open-science-ncku-2024/scroll
Oh, and don't forget!
Oh, and don't forget!
Open Science: hurdles, gains, and opportunities [Psych Forum NCKU]
By Stefano Moia
Open Science: hurdles, gains, and opportunities [Psych Forum NCKU]
CC-BY 4.0 Stefano Moia, 2024. Images are property of the original authors and should be shared following their respective licences. This presentation is otherwise licensed under CC BY 4.0. To view a copy of this license, visit https://creativecommons.org/licenses/by/4.0/
- 2