Bootstrap FAIRlyBig

- a trivially reproducible HTC workflow

Dr. Felix Hoffstaedter

Cognitive and Affective Biopsychology Group

Institute for Neuroscience and Medicine, INM-7 Research Centre Jülich

Institute of Systems Neuroscience, Heinrich Heine University Düsseldorf, Germany

Brainage prediction in Parkinson's Disease

Distributed Data Management

  • clone Data from repository (built-in backup)
     
  • Look at data structure without content present
     
  • Automatic download of single files or all data
     
  • Multiple representations i.e. folders & names of same data
     
  • Easy collaboration on data via common platforms (i.e OSF)
  • git init dataset
     
  • git clone URL        #code repo
     
  • git add file/s *              #text
    git commit -m "
    info"

     
  • git remote add URL
     
  • git push / fetch / pull
     
  • git branch / switch branch name
     
  • git submodule add URL
  • datalad create dataset
     
  • datalad clone URL  #code & DATA
     
  • datalad save -m "info" \
    file/s *                        #ANYTHING

     
  • datalad siblings add URL
     
  • datalad push / update (--merge)
    - - - - - - - - - - - - - - - - - -
  • datalad get / drop file/s * #content

>>> datalad run -m "info" command <<<

Data

Results

Pipeline

DATA PROCESSING

 Code

Data

Results

Pipeline

Reproducible DATA PROCESSING

 Code

- archived -

- archived -

Containerization

Singularity

 Upload your Code

 share data

Data

Results

Pipeline

DATA PROCESSING

 Code

tracking changes in any set of files

Data

Results

Pipeline

DATA PROCESSING

 Code

Data

Containerized

 Code

* Clone everything *

*  everywhere *

* without *

* filecontent *

Reproducible DATA PROCESSING

datalad run -m "info" command

  • saves (add+commit) all produced data/changes in the dataset
  • captures the command executed in machine readable format
datalad run 

wrapper for trivially reproducible processing

datalad rerun commit hash

  • reruns the identical command on the same data using the same software (give all inputs are available)

Bootstrap FAIRlyBig

- a trivially reproducible HTC workflow

Distributed data management

Transparent data processing

  1. Datalad        - distributed data management tool

  2. Singularity  - software hosting environment

  3. CON/duct    - lightweigt compute resource monitoring

  4. HTCondor & SLURM    - job scheduling systems

Bootstrap FAIRlyBig

- a trivially reproducible HTC workflow

Reproducible DATA PROCESSING

Execute FAIRly Big Workflow

input_store

output_store

temporal workdir

Bootstrap FAIRlyBig

- a trivially reproducible HTC workflow

Brainage prediction in Parkinson's Disease

Brain Age prediction: Why & How

Brain Age prediction in Parkinson's Disease

  • Biggest & most reliable effect on brain structure measured with MRI

Brain Ageing

Grey matter |      White Matter     |   Subcortical GM      |   Ventricels

Brainage prediction

Brainage prediction in Parkinson's Disease

Features:   4mm Smoothing + 4mm Resolution + PCA Model:       Gaussian Processes Regression (GPR)

  • Disease Progression modeled using LTJMM  (latent time joint mixed-effects model)
    Input:   UPDRS, PIGH, MoCA & SCOPA
  • Parkinson's Disease subtyping via VaDER (variational deep embedding with recurrence)
    Input:   LTJMM outcome trajectories
  • Segmentation with CAT12.8.1 for VBM
  • GLM difference: PD subtypes & Controls
  • No significant difference between PD subtypes
  • Brain age workflow selection on PPMI HC
    using 3 best models from Moore et al. 2022 + 2 bias corrections
  • 4mm Resolution LASSO regression model trained on PPMI HC + Beheshti Bias correction
  • increased Brainaging is associated with increased cognitive, not motor decline
  •  each year increase in baseline BAG was associated with:
    • 2% increased hazard for cognitive decline
    • 4% increased hazard for mild cognitive impairment

Scale: 1) normal cognition, 2) subjective cogn. impairment,
3) mild cognitive impairment, and 4) dementia

  • Power & sample size estimation for a clinical train assuming an
    effect of 30% on the progression of cognitive decline over 2 years