Score learning and inference for diffusion processes on shape spaces

Stefan Sommer, University of Copenhagen

Faculty of Science, University of Copenhagen

UMR CRIStAL, November, 2025

Shape stochastics, conditional processes, scores and inference

w/ Frank v.d. Meulen, Rasmus Nielsen, Christy Hipsley, Sofia Stoustrup, Libby Baker, Gefan Yang, Michael Severinsen, Jingchao Zhou

Villum foundation

Novo nordisk foundation

University of Copenhagen

Center for Computational Evolutionary Morphometrics

w/ Rasmus Nielsen

Brownian motion model of trait evolution

Brown. motion

branch (independent children)

incorporate leaf observations $x_{V_T}$ into probabilistic model:
$p(X_t|x_{V_T})$

Felsenstein's pruning algorithm for shapes

Brown. motion

1) What is a shape Brownian motion?

2) How do we condition the nonlinear process on shape observations?

3) How do we perform inference in the full model?

Score learning in generative AI diffusion models

Stochastics of shapes

Stochastic processes that

apply to landmarks, curves, surfaces and images
are independent of discretization
preserve shape structure
equivariant to acting groups
can be recovered from discretizations
$\Large\Rightarrow$
model correlations between points
are nonlinear

Shapes, deformations and nonlinearity

E_{s_0,s_1}(\phi)=R(\phi)+\frac1\lambda S(\phi.s_0,s_1)

action: $\phi.s=\phi\circ s$ (shapes)
$\phi.s=s\circ\phi^{-1}$ (images)

$ \phi $

$ \phi $ warp of domain $\Omega$ (2D or 3D space)

landmarks: $s=(x_1,\ldots,x_n)$

curves: $s: \mathbb S^1\to\mathbb R^2$

surfaces: $s: \mathbb S^2\to\mathbb R^3$

s_0

s_1

Geometric + metric view

R(\phi_t)=\int_0^T\|\partial_t \phi_t\|_{\phi_t}^2dt

$ \phi_t:[0,T]\to\mathrm{Diff}(\Omega) $ path of diffeomorphisms (parameter t)

\mathrm{Diff}(\Omega)

\mathrm{Id}_{\mathrm{Diff}(\Omega)}

\phi_t

LDDMM: Grenander, Miller, Trouve, Younes, Christensen, Joshi, et al.

\partial_t \phi_t

\phi

Evolution with noise

\partial_t \phi_t = F(\phi_t)\ \to\ d\phi_t=F(\phi_t)dt\color{blue}{+\sigma(\phi_t) dW_t}

\mathrm{Diff}(\Omega)

\mathrm{Id}_{\mathrm{Diff}(\Omega)}

\phi_t

Markussen,CVIU'07; Budhiraja,Dupuis,Maroulas,Bernoulli'10
Trouve,Vialard,QAM'12;Vialard,SPA'13;Marsland/Shardlow,SIIMS'17
Arnaudon,Holm,Sommer,IPMI'17; FoCM'18; JMIV'19
Arnaudon,v.d. Meulen,Schauer,Sommer'21

geodesic ODE

perturbed SDE

Eulerian shape process / Kunita flow

Diffeomorphism
\[\phi_t(x)=x+X_t(x)\]Infinite noise Kunita flow:

\[dX_t = Q^{1/2}(X_t) \circ dW_t\]

$Q^{1/2}(X_t)v(x) =\\\qquad \int_{D} k^{Q^{1/2}}(x+X_t(x),y) v(y) \, dy$

Landmark shape process:

\[dX_t=\sqrt{K(X_t)}\circ dW_t\]

Kernel matrix

\[K(X_t)^i_j=k(x_i,x_j)\]encodes landmark covariance

$X_t$ landmarks at time $t$:

\[X_t=\begin{pmatrix}x_{1,t}\\y_{1,t}\\\vdots\\x_{n,t}\\y_{n,t}\end{pmatrix}\]

Inference: Data conditional processes

$X_t$ generates distribution $p(\cdot|X_0)$
given observed shape $v$, we wish for $T>0$ to generate $X_t|X_T=v$ - a bridge

$X_t$ (no conditioning)

$X_t|X_T=v$ (conditioned)

Bridges - conditional processes

forward SDE:
\[dx_t = b(t,x_t)\,dt + \sigma(t,x_t)\,dW_t\]
conditioning to hit $v$ at time $T$ gives
new measure $\mathbb{P}^*$:
\[\frac{d\mathbb{P}^*}{d\mathbb{P}}\Big|_{\mathcal{F}_t}= \frac{p(v,T\mid x_t,t)}{p(v,T\mid X_0,0)}\]
Doob's $h$-function: $h(t,x) = p(v,T \mid x,t)$
SDE of $\mathbb{P}^*$:
\[dx_t = \big[b(t,x_t) +a(t,x_t)\nabla_x \log \rho(x_t,t)\big]dt + \sigma(t,x_t)\,dW_t\]$a(t,x_t)=\sigma\sigma^\top(t,x_t)$, $\rho(x,t)=p(v,T\mid x_t,t)$
the score $\nabla_x \log \rho(x,t)$ steers trajectories to hit $v$ at $T$

Score learning

learn the score $ \nabla_x \log p(x,t) $ of the forward diffusion
fit neural network $ s_\theta $ to minimize
\[\mathcal{L}(\theta)= \frac{1}{2} \sum_{m=1}^M \int_{t_{m-1}}^{t_m}\left[\big\| s_\theta(t, x_t)-\nabla\log p(x_{t},t\mid x_{t_{m-1}},t_{m-1})\big\|_{a(t,x_t)}^2\right] dt\]
plug $ s_\theta $ into SDE to sample: \[ dx_t = b(t,x_t)dt +a(t,x_t)s_\theta(t,x_t) dt + \sigma(t,x_t)\,dW_t \]
+: scalable training from unconditioned data; no bridge likelihoods needed.
-: can be hard to train in practice, rare-event bridges poorly covered, parameter dependence

Infinite dimensions and score learning

Train a neural network to learn the score in the bridge SDE in inf. dim.

\[dx_t=b(t,x_t)dt+a(t,x_t)\nabla_x\log \rho(t,x_t)dt\\+\sigma(t,x_t)dW_t\]particularly for shape Kunita flows

Score learning for curve, surface processes

Zhou,Yang,Sommer,GSI'25

Guiding: explicit score approximations

dx_t = b(t,x_t)dt +\sigma(t,x_t)dW_t

Delyon/Hu 2006:

$\sigma$ invertible:

guided bridge proposal$$dy_t = b(t,y_t)dt - \frac{y_t-v}{T-t}dt + \sigma(t,y_t)dW_t$$
$y_T=v$ a.s.
$x_t|x_T=v$ absolute continuous wrt. $y_t$
$\mathbb E_{x_t|x_T=v}[f(x_t)]\propto \mathbb E_{y_t}[f(y_t)\varphi(y_t)]$

$v$

$x_0$

$x_t$

Conditioned shape process

Conditioning on hitting target $v$ at time $T>0$:

\[X_t|X_T=v\]

Ito stochastic process:

\[dx_t=b(t,x_t)dt\qquad\qquad\qquad\qquad\quad\\+\sigma(t,x_t)dW_t\]

True bridge:

\[dx^*_t=b(t,x^*_t)dt+a(t,x^*_t)\nabla_x\log \rho_t(x^*_t)dt\\+\sigma(t,x^*_t)dW_t\]

Score $\nabla_x\log \rho_t$ intractable:

\[\rho_t(x)=p_{T-t}(v;x)\]

\[a(t,x)=\sigma(t,x)\sigma(t,x)^T\]

black: $X_0$, red: $v$

Auxilary process:

\[d\tilde{x}_t=\tilde{b}(t,\tilde{x}_t)dt+\tilde{\sigma}(t,\tilde{x}_t)dW_t\]

Approximate bridge:

\[dx_t^\circ=b(t,x_t^\circ)dt+a(t,x_t^\circ)\nabla_x\log \tilde{\rho}_t(x_t^\circ)dt\\+\sigma(t,x_t^\circ)dW_t\]

for e.g. linear processes, score $\nabla_x\log \tilde{\rho}_t$ is known in closed from

(almost) explicitly computable likelihood ratio:

\[\frac{d\mathbb P^*}{d\mathbb P^\circ}=\frac{\tilde{\rho}_T(v)}{\rho_T(v)}\Psi(x_t^\circ)\]

Backward filtering, forward guiding: van der Meulen, Schauer et al.

Ito stochastic process:

\[dx_t=b(t,x_t)dt+\sigma(t,x_t)dW_t\]

Bridge process:

\[dx^*_t=b(t,x^*_t)dt+a(t,x^*_t)\nabla_x\log\rho_t(x^*_t)dt\\+\sigma(t,x^*_t)dW_t\]

Score $\nabla_x\log \rho_t$ intractable, but ...

Backwards filtering, forward guiding bridges

v.d. Meulen,Schauer,Arnaudon,Sommer,SIIMS'22

From single edges to trees

Bridge:

Leaf conditioning:

$x_0$

$v$

$x_0$

$h$

$v_1$

van der Meulen, Schauer'20; van der Meulen'22
Stoustrup, Nielsen, van der Meulen, Sommer

$v_2$

recursive,leaves to root

Backwards filter:

root to leaves

Forward guiding:

$v$

$v_1$

$v_2$

$h$

$x_0$

Felsenstein's pruning algorithm for shapes

Brown. motion

branch (independent children)

incorporate leaf observations $x_{V_T}$ into probabilistic model:
$p(X_t|x_{V_T})$

Doob’s h-transform

$h_s(x)=\prod_{t\in\mathrm{ch(s)}}h_{s\to t}(x)$

conditioned process $X^*_t$

approximations $\tilde{h}$

guided process $X^\circ_t$

Upwards message passing

Messages:

approximation of h-transform \[h(x,t)=e^{c+Fx+x^THx}\](Doob's h function)

Up:

propagate c,F,H

Fuse:

sum c,F,H

Backwards filtering, forward guiding butterflies

v.d. Meulen,Schauer,Sommer,'25

Parameter inference with MCMC

sample parameters (e.g. kernel width, amplitude)

v.d. Meulen,Schauer,Arnaudon,Sommer,SIIMS'22

Parameter inference with MCMC

Canidae skulls

Severinsen, Hipsley, Nielsen, Sommer

Felsenstein's pruning algorithm for shapes

Brown. motion

1) What is a shape Brownian motion?

2) How do we condition the nonlinear process on shape observations?

3) How do we perform inference in the full model?

Neural guided diffusion bridges

Yang,van der Meulen,Sommer,ICML'25

in addition to guide $a(t,x_t)\tilde r(t,x_t)$ to $b(t,x_t)$, further inject a learnable control $\sigma(t,x)\vartheta_\theta(t,x)$:
\[dx_t = \big[b(t,x_t) + a(t,x_t)\tilde r(t,x_t) + \sigma(t,x_t)\vartheta_\theta(t,x_t)\big]dt\\ + \sigma(t,x_t)\,dW_t\]
$\vartheta_\theta$ is a bounded neural net that approximates the missing score-like drift $r^\star - \tilde r$, steering the guided process toward the exact bridge
training minimizes the loss
\[L(\theta) = \mathbb{E}\!\int_0^T \left[\tfrac12\|\vartheta_\theta(t,x_t)\|^2 - G(t,x_t)\right]dt,\] where $G$ encodes drift/diffusion mismatches
when $L(\theta)$ is minimized, independent bridges draw as cheaply as the forward SDE, no MCMC-based guides or unconditional score learning.

Other takes:
Upwards LDDMM: recursive shape matching

Severinsen, Hipsley, Nielsen, Sommer

Most probable flows

Geometric statistics

Diffusion mean

Most probable paths

Eltzner, Huckemann, Grong, Corstanje,van der Meulen,Schauer,Sommer et al.

Manifold bridges

Software

Jax magic... in milliseconds:

Trees with millions of nodes
shapes with millions of landmarks

Geometry, stochastics, geometric statistics

JaxGeometry: https://github.com/computationalevolutionarymorphometry/jaxgeometry CCEM: http://www.ccem.dk

Hyperiax: https://github.com/computationalevolutionarymorphometry/hyperiax slides: https://slides.com/stefansommer

References:

Grong, Sommer: Most probable paths for developed processes, https://arxiv.org/abs/2211.15168
Grong, Sommer: Most probable flows for Kunita SDEs, https://arxiv.org/abs/2209.03868
Sommer, Schauer, v. d. Meulen: Stochastic flows and shape bridges, Oberwolfach, 2021
Baker, Besnier, Sommer: A function space perspective on stochastic shape evolution, https://arxiv.org/abs/2302.05382
Yang, Baker, Severinsen, Hipsley, Sommer: Simulating infinite-dimensional nonlinear diffusion bridges, https://arxiv.org/abs/2405.18353
Baker, Yang, Severinsen, Hipsley, Sommer: Conditioning non-linear and infinite-dimensional diffusion processes, https://arxiv.org/abs/2402.01434
Hansen, Eltzner, Huckemann, Sommer: Diffusion Means in Geometric Spaces, Bernoulli, 2023, arXiv:2105.12061
Grong, Sommer: Most probable paths for anisotropic Brownian motions on manifolds, FoCM 2022, arXiv:2110.15634
Philipp Harms, Peter W. Michor, Xavier Pennec, Stefan Sommer: Geometry of sample spaces, Diff. Geom. and its Appl., 2023, arXiv:2010.08039
Arnaudon, v.d. Meulen, Schauer, Sommer: Diffusion bridges for stochastic Hamiltonian systems and shape evolutions,SIIMS,2022,arXiv:2002.00885
Højgaard Jensen, Sommer: Simulation of Conditioned Diffusions on Riemannian Manifolds, 2021, arXiv:2105.13190.
Arnaudon, Holm, Sommer: A Geometric Framework for Stochastic Shape Analysis, Foundations of Computational Mathematics, 2019, arXiv:1703.09971.
Sommer, Svane: Modelling Anisotropic Covariance using Stochastic Development and Sub-Riemannian Frame Bundle Geometry, JoGM, 2017, arXiv:1512.08544.
Arnaudon, Holm, Sommer: A Stochastic Large Deformation Model for Computational Anatomy, IPMI 2017, arXiv:1612.05323.

Score learning and inference for diffusion processes on shape spaces

Shape stochastics, conditional processes, scores and inference

Brownian motion model of trait evolution

Felsenstein's pruning algorithm for shapes

Score learning in generative AI diffusion models

Stochastics of shapes

Shapes, deformations and nonlinearity

Geometric + metric view

Evolution with noise

Eulerian shape process / Kunita flow

Inference: Data conditional processes

Bridges - conditional processes

Score learning

Infinite dimensions and score learning

Score learning for curve, surface processes

Guiding: explicit score approximations

Conditioned shape process

Backwards filtering, forward guiding bridges

Backwards filtering, forward guiding bridges

From single edges to trees

Felsenstein's pruning algorithm for shapes

Upwards message passing

Backwards filtering, forward guiding butterflies

Parameter inference with MCMC

Parameter inference with MCMC

Canidae skulls

Felsenstein's pruning algorithm for shapes

Neural guided diffusion bridges

Other takes: Upwards LDDMM: recursive shape matching

Most probable flows

Geometric statistics

Software

Geometry, stochastics, geometric statistics

Score learning and inference for diffusion processes on shape spaces

More from Stefan Sommer

Other takes:
Upwards LDDMM: recursive shape matching