Bootstrap FAIRlyBig
- a trivially reproducible HTC workflow


Dr. Felix Hoffstaedter
Cognitive and Affective Biopsychology Group
Institute for Neuroscience and Medicine, INM-7 Research Centre Jülich
Institute of Systems Neuroscience, Heinrich Heine University Düsseldorf, Germany
Brainage prediction in Parkinson's Disease

Distributed Data Management
-
clone Data from repository (built-in backup)
-
Look at data structure without content present
-
Automatic download of single files or all data
-
Multiple representations i.e. folders & names of same data
- Easy collaboration on data via common platforms (i.e OSF)

-
git init dataset
-
git clone URL #code repo
-
git add file/s * #text
git commit -m "info"
-
git remote add URL
-
git push / fetch / pull
-
git branch / switch branch name
- git submodule add URL


-
datalad create dataset
-
datalad clone URL #code & DATA
-
datalad save -m "info" \
file/s * #ANYTHING
-
datalad siblings add URL
-
datalad push / update (--merge)
- - - - - - - - - - - - - - - - - - - datalad get / drop file/s * #content
>>> datalad run -m "info" command <<<

Data

Results

Pipeline
DATA PROCESSING

Code

Data

Results

Pipeline
Reproducible DATA PROCESSING
Code
- archived -
- archived -

Containerization


Singularity






Upload your Code
share data


Data

Results

Pipeline
DATA PROCESSING

Code



tracking changes in any set of files

Data

Results

Pipeline
DATA PROCESSING

Code


Data


Containerized
Code
* Clone everything *
* everywhere *
* without *
* filecontent *
Reproducible DATA PROCESSING
datalad run -m "info" command
- saves (add+commit) all produced data/changes in the dataset
- captures the command executed in machine readable format

datalad run
wrapper for trivially reproducible processing
datalad rerun commit hash
- reruns the identical command on the same data using the same software (give all inputs are available)
Bootstrap FAIRlyBig
- a trivially reproducible HTC workflow

Distributed data management
Transparent data processing


-
Datalad - distributed data management tool
-
Singularity - software hosting environment
-
CON/duct - lightweigt compute resource monitoring
Bootstrap FAIRlyBig
- a trivially reproducible HTC workflow

Reproducible DATA PROCESSING

Execute FAIRly Big Workflow
input_store

output_store






temporal workdir
Bootstrap FAIRlyBig
- a trivially reproducible HTC workflow

Brainage prediction in Parkinson's Disease
Brain Age prediction: Why & How
Brain Age prediction in Parkinson's Disease
- Biggest & most reliable effect on brain structure measured with MRI
Brain Ageing


Grey matter | White Matter | Subcortical GM | Ventricels
Brainage prediction



Brainage prediction in Parkinson's Disease






Features: 4mm Smoothing + 4mm Resolution + PCA Model: Gaussian Processes Regression (GPR)





- Disease Progression modeled using LTJMM (latent time joint mixed-effects model)
Input: UPDRS, PIGH, MoCA & SCOPA - Parkinson's Disease subtyping via VaDER (variational deep embedding with recurrence)
Input: LTJMM outcome trajectories


- Segmentation with CAT12.8.1 for VBM
- GLM difference: PD subtypes & Controls






- No significant difference between PD subtypes


- Brain age workflow selection on PPMI HC
using 3 best models from Moore et al. 2022 + 2 bias corrections

- 4mm Resolution LASSO regression model trained on PPMI HC + Beheshti Bias correction




- increased Brainaging is associated with increased cognitive, not motor decline

-
each year increase in baseline BAG was associated with:
- 2% increased hazard for cognitive decline
- 4% increased hazard for mild cognitive impairment

Scale: 1) normal cognition, 2) subjective cogn. impairment,
3) mild cognitive impairment, and 4) dementia

- Power & sample size estimation for a clinical train assuming an
effect of 30% on the progression of cognitive decline over 2 years

ORIGAMI_talk_bootstrapFAIRlyBig_PD
By Felix Hoffstaedter
ORIGAMI_talk_bootstrapFAIRlyBig_PD
- 24

