Wasserstein-Otto geometry

Klas Modin

slides.com/kmodin

Riemannian principal bundles

E/H\simeq B

\mathrm{Hor}

\hookleftarrow H

\downarrow

\pi

Invariant Riemannian metric on \(E\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

Riemannian principal bundles

E/H\simeq B

\mathrm{Hor}

\hookleftarrow H

\downarrow

\pi

Invariant Riemannian metric on \(E\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

Riemannian principal bundles

G/H\simeq B

\mathrm{Hor}

\hookleftarrow H

\downarrow

\pi

left co-sets \([g] = g\cdot H \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

Riemannian principal bundles

G/G_{b_0}\simeq B

\mathrm{Hor}

\hookleftarrow G_{b_0}

\downarrow

\pi(g) = g\cdot b_0

left co-sets \([g] = g\cdot G_{b_0} \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

b_0

G_{b_0}

Riemannian principal bundles

G/G_{b_0}\simeq B

\text{horizontal flow}

\hookleftarrow G_{b_0}

\downarrow

\pi(g) = g\cdot b_0

left co-sets \([g] = g\cdot G_{b_0} \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

b_0

Riemannian principal bundles

G/G_{b_0}\simeq B

\text{vertical flow}

\hookleftarrow G_{b_0}

\downarrow

\pi(g) = g\cdot b_0

left co-sets \([g] = g\cdot G_{b_0} \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

b_0

Riemannian principal bundles

G/G_{b_0}\simeq B

g\in G

\hookleftarrow G_{b_0}

\downarrow

\pi(g) = g\cdot b_0

left co-sets \([g] = g\cdot G_{b_0} \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

b_0

b_1 = \pi(g) = g\cdot b_0

Riemannian principal bundles

G/G_{b_0}\simeq B

\gamma(1)\in G

\hookleftarrow G_{b_0}

\downarrow

\pi(g) = g\cdot b_0

left co-sets \([g] = g\cdot G_{b_0} \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

b_0

b_1 = \pi(g) = g\cdot b_0

\gamma'(t)\in \mathrm{Hor}

Riemannian principal bundles

G/G_{b_0}\simeq B

g = \gamma(1)(\gamma(1)^{-1}g)

\hookleftarrow G_{b_0}

\downarrow

\pi(g) = g\cdot b_0

left co-sets \([g] = g\cdot G_{b_0} \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

b_0

b_1 = \pi(g) = g\cdot b_0

\gamma'(t)\in \mathrm{Hor}

\overbrace{\phantom{klasklas}}^{\in G_{b_0}}

Riemannian principal bundles

G/G_{b_0}\simeq B

g = \gamma(1)h

\hookleftarrow G_{b_0}

\downarrow

\pi(g) = g\cdot b_0

left co-sets \([g] = g\cdot G_{b_0} \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

b_0

b_1 = \pi(g) = g\cdot b_0

\gamma'(t)\in \mathrm{Hor}

Riemannian principal bundles

G/G_{b_0}\simeq B

g = k h

\hookleftarrow G_{b_0}

\downarrow

\pi(g) = g\cdot b_0

left co-sets \([g] = g\cdot G_{b_0} \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

b_0

b_1 = \pi(g) = g\cdot b_0

polar cone

Optimal mass transport (OMT)

\mu_0

\mu_1

\varphi_*\mu_0

\displaystyle\min_{\varphi*\mu_0=\mu_1} \int_{M} d_M^2(\varphi(x),x ) \mu_0

Monge problem, \(L^2\) version

Optimal mass transport (OMT)

\mu_0

\mu_1

\varphi_*\mu_0

\displaystyle\min_{\varphi*\mu_0=\mu_1} \int_{\mathbb{R}^n} \lvert \varphi(x)-x \rvert^2 \mu_0

\mathbb{R}^n

Monge problem, \(L^2\) version

\underbrace{\phantom{klasklkllkklklas}}_{J(\varphi)}

Riemannian structure of OMT

\mathrm{Diff}(M)

\mathrm{Dens}(M)\simeq \mathrm{Diff}(M)/\mathrm{Diff}_{\mu_0}(M)

\mathrm{Id}

\mu_0

\mu_1

\pi(\varphi)=\varphi_*\mu_0

Riemannian metric

\displaystyle\mathcal{G}_\varphi(\dot\varphi,\dot\varphi) = \int_{M}\left\vert \dot\varphi \right\vert^2 \mu_0

Induced metric

\overline{\mathcal{G}}_\mu(\dot\mu,\dot\mu) = \int_M \lvert \nabla \theta\rvert^2 \mu

[Benamou & Brenier (2000), Otto (2001)]

Invariance: \(\eta\in\mathrm{Diff}_{\mu_0}(M)\)

\displaystyle\mathcal{G}_\varphi(\dot\varphi,\dot\varphi) = \mathcal{G}_{\varphi\circ\eta}(\dot\varphi\circ\eta,\dot\varphi\circ\eta)

\dot\rho + \operatorname{div}(\rho \nabla\theta) = 0, \; \rho = \mu/dx

Geodesics on \(\operatorname{Diff}(\mathbb{R}^n)\)

Geodesic equation:

\ddot\varphi = 0 \Rightarrow \varphi(t) = \mathrm{Id} + t\,v_0

\mathrm{Id}

\mu_0

\mu_1

\mathrm{Hor}_{\mathrm{Id}} = \nabla C^\infty(\mathbb{R}^n)

Easy to prove:

Polar cone \(K\) is isomorphic to strictly convex smooth functions via \(\phi \mapsto \nabla\phi\)

Hard to prove:

Polar cone \(K\) a section of principal bundle

Geodesics on \(\operatorname{Diff}(\mathbb{R}^n)\)

Geodesic equation:

\ddot\varphi = 0 \Rightarrow \varphi(t) = \mathrm{Id} + t\,v_0

\mathrm{Id}

\mu_0

\mu_1

\varphi = \nabla\phi\circ\eta, \; \eta\in \operatorname{Diff}_{\mu_0}(\mathbb{R}^n)

Easy to prove:

Polar cone \(K\) is isomorphic to strictly convex smooth functions via \(\phi \mapsto \nabla\phi\)

Hard to prove:

Polar cone \(K\) a section of principal bundle

Brenier's decomposition of transport maps

Geodesic distance on \(\operatorname{Diff}(\mathbb{R}^n)\)

Geodesic curve:

\varphi(t) = (1-t)\varphi_0 + t \varphi_1

\mathrm{Id}

\mu_0

\mu_1

\mathrm{dist}(\varphi_0,\varphi_1)^2 = \int_0^1 \mathcal{G}_{\varphi(t)}(\dot\varphi(t),\dot\varphi(t)) dt

= \int_{\mathbb{R}^n} \lvert \varphi_1(x)-\varphi_0(t) \rvert^2

= \int_0^1\int_{\mathbb{R}^n} \lvert \dot\varphi(t)\rvert^2\mu_0 dt

In particular:

J(\varphi) = \mathrm{dist}(\mathrm{Id},\varphi)^2

Monge-Ampere equation on \(\mathbb{R}^n\)

Geodesic curve:

\varphi(t) = \nabla( |x|^2/2 + t f )

\mathrm{Id}

\mu_0

\mu_1

(\nabla\phi)_*\mu_0 = \mu_1

In particular:

J(\varphi) = \mathrm{dist}(\mathrm{Id},\varphi)^2

\underbrace{\phantom{klaklklsklkasl}}_{\nabla\phi}

\displaystyle \Rightarrow \operatorname{det}(\nabla^2\phi) = \frac{\rho_0}{\rho_1\circ \nabla\phi}

Monge-Ampere equation on \(\mathbb{R}^n\)

Geodesic curve:

\varphi(t) = \nabla( |x|^2/2 + t f )

\mathrm{Id}

\mu_0

\mu_1

(\nabla\phi)_*\mu_0 = \mu_1

In particular:

J(\varphi) = \mathrm{dist}(\mathrm{Id},\varphi)^2

\underbrace{\phantom{klaklklsklkasl}}_{\nabla\phi}

\displaystyle \Rightarrow \operatorname{det}(\nabla^2\phi) = \frac{\rho_0}{\rho_1\circ \nabla\phi}

Linear optimal mass transport

Trivial observation: \(\varphi_0(x) = A_0 x\), \(\varphi_1(x) = A_1 x\) linear diffeomorphisms \(\Rightarrow\) geodesic consists of linear diffeomorphisms

Consequence: \(GL(n)\) is totally geodesic subgroup of \(\operatorname{Diff}(\mathbb{R}^n)\)

Corresponding subspace of densities (statistical submanifold): multivariate Gaussians with zero mean

\displaystyle \rho(x) = \frac{1}{\sqrt{(2\pi)^n\mathrm{det}(\Sigma)}}\mathrm{exp}(-\frac{1}{2}x^\top \Sigma^{-1}x)

\Sigma \in P(n)

Bundle structure

GL(n)

P(n)\simeq GL(n)/O(n,\Sigma_0)

\Sigma_0

\Sigma_1

\pi(A)=A\Sigma_0 A^\top

\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)

\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)

\dot\Sigma = S\Sigma + \Sigma S

Bundle structure

GL(n)

P(n)\simeq GL(n)/O(n,\Sigma_0)

\Sigma_0

\Sigma_1

\pi(A)=A\Sigma_0 A^\top

\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)

\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)

\dot\Sigma = S\Sigma + \Sigma S

K = P(n)

Bundle structure

GL(n)

P(n)\simeq GL(n)/O(n,\Sigma_0)

\Sigma_0

\Sigma_1

\pi(P)=P\Sigma_0 P = \Sigma_1

\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)

\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)

\dot\Sigma = S\Sigma + \Sigma S

K = P(n)

Monge-Ampere equation:

Bundle structure

GL(n)

P(n)\simeq GL(n)/O(n,\Sigma_0)

\Sigma_0

\Sigma_1

A = PQ

\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)

\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)

\dot\Sigma = S\Sigma + \Sigma S

K = P(n)

Factorization theorem:

Bundle structure

GL(n)

P(n)\simeq GL(n)/O(n,\Sigma_0)

\Sigma_0

\Sigma_1

\dot B = -\mathrm{Pr}\nabla_{\mathcal G}J(B), \; B(0) = A

\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)

\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)

\dot\Sigma = S\Sigma + \Sigma S

K = P(n)

Vertical gradient flow:

Bundle structure

GL(n)

P(n)\simeq GL(n)/O(n,\Sigma_0)

\Sigma_0

\Sigma_1

\dot B = \Omega B, \; B(0) = A

\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)

\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)

\dot\Sigma = S\Sigma + \Sigma S

K = P(n)

Vertical gradient flow:

\Sigma_1 \Omega + \Omega\Sigma_1 = 2\Sigma_1 (B^{-1}-B^{-\top})

Bundle structure

GL(n)

P(n)

\Sigma_0

\Sigma_1

\dot B = \Omega B, \; B(0) = A

\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)

\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)

\dot\Sigma = S\Sigma + \Sigma S

K = P(n)

Vertical gradient flow:

\Sigma_1 \Omega + \Omega\Sigma_1 = 2\Sigma_1 (B^{-1}-B^{-\top})

Bundle structure

GL(n)

P(n)

\Sigma_0

\Sigma_1

\dot \Sigma = \mathrm{Pr}\nabla_{\bar{\mathcal G}}H_{\Sigma_1}(\Sigma)

\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)

\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)

\dot\Sigma = S\Sigma + \Sigma S

K = P(n)

Horizontal gradient (heat) flow:

\Sigma(t)

\displaystyle H_{\Sigma_1}(\Sigma) = -\frac{1}{2}\mathrm{tr}(\Sigma_1^{-1}\Sigma) + \frac{1}{2}\log\det(\Sigma_1^{-1}\Sigma)

Relative entropy

(Kullback-Leibler)

Bundle structure

GL(n)

P(n)

\Sigma_0

\Sigma_1

\dot \Sigma = 2I - \Sigma_1^{-1}\Sigma - \Sigma\Sigma_1^{-1}

\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)

\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)

\dot\Sigma = S\Sigma + \Sigma S

K = P(n)

Horizontal gradient (heat) flow:

\Sigma(t)

\displaystyle H_{\Sigma_1}(\Sigma) = -\frac{1}{2}\mathrm{tr}(\Sigma_1^{-1}\Sigma) + \frac{1}{2}\log\det(\Sigma_1^{-1}\Sigma)

Relative entropy

(Kullback-Leibler)

Bundle structure

GL(n)

P(n)

\Sigma_0

\Sigma_1

\dot P = P^{-1}\Sigma_0^{-1} - \Sigma_1^{-1}P + V

\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)

\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)

\dot\Sigma = S\Sigma + \Sigma S

K = P(n)

Horizontal gradient (heat) flow:

P(t)

\displaystyle F(P) = H_{\Sigma_1}(P\Sigma_0 P)

Lifted gradient flow on \(K\) for

Bundle structure

GL(n)

P(n)

\Sigma_0

\Sigma_1

\dot P = P^{-1}\Sigma_0^{-1} - \Sigma_1^{-1}P + V

\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)

\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)

\dot\Sigma = S\Sigma + \Sigma S

K = P(n)

Horizontal gradient (heat) flow:

P(t)

\displaystyle F(P) = H_{\Sigma_1}(P\Sigma_0 P)

Lifted gradient flow on \(K\) for

Hessian of \(F(P)\) strictly positive on \(K\) \(\Rightarrow\) unique limit!

Bundle structure

GL(n)

P(n)

\Sigma_0

\Sigma_1

\dot P = P^{-1}\Sigma_0^{-1} - \Sigma_1^{-1}P + V

\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)

\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)

\dot\Sigma = S\Sigma + \Sigma S

K = P(n)

Horizontal gradient (heat) flow:

P(t)

\displaystyle F(P) = H_{\Sigma_1}(P\Sigma_0 P)

Lifted gradient flow on \(K\) for

Hessian of \(F(P)\) strictly positive on \(K\) \(\Rightarrow\) unique limit!

Bundle structure

GL(n)

P(n)

\Sigma_0

\Sigma_1

\dot P = P^{-1}\Sigma_0^{-1} - \Sigma_1^{-1}P + V

\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)

\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)

\dot\Sigma = S\Sigma + \Sigma S

K = P(n)

Horizontal gradient (heat) flow:

P(t)

\displaystyle F(P) = H_{\Sigma_1}(P\Sigma_0 P)

Lifted gradient flow on \(K\) for

Hessian of \(F(P)\) strictly positive on \(K\) \(\Rightarrow\) unique limit!

Bundle structure

GL(n)

P(n)

\Sigma_0

\Sigma_1

\dot P = P^{-1}\Sigma_0^{-1} - \Sigma_1^{-1}P + V

\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)

\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)

\dot\Sigma = S\Sigma + \Sigma S

K = P(n)

Horizontal gradient (heat) flow:

P(t)

\displaystyle F(P) = H_{\Sigma_1}(P\Sigma_0 P)

Lifted gradient flow on \(K\) for

Hessian of \(F(P)\) strictly positive on \(K\) \(\Rightarrow\) unique limit!

Wasserstein-Otto vs. Fisher-Rao

\mathrm{Dens}(M)

T_\mu\mathrm{Dens}(M)\simeq C^\infty_0(M)

Wasserstein

Fisher-Rao

\displaystyle\overline{\mathcal{G}}_\mu(\dot\mu,\dot\mu) = \int_{M} \frac{\dot\mu}{\mu}\frac{\dot\mu}{\mu}\mu

\displaystyle\overline{\mathcal{G}}_{\rho dx}(\dot\rho dx,\dot\rho dx) = \int_{M} |\nabla\theta|^2\rho

\displaystyle \dot\rho + \mathrm{div}(\rho \nabla\theta) = 0

Dependent on Riemannian structure of \(M\)

Independent of Riemannian structure of \(M \Rightarrow \mathrm{Diff}(M)\)-invariance

\displaystyle \rho(x) = \frac{1}{\sqrt{(2\pi)^n\mathrm{det}(\Sigma)}}\mathrm{exp}(-\frac{1}{2}x^\top \Sigma^{-1}x)

\displaystyle \rho(x) = \sqrt{\frac{\mathrm{det}(W)}{(2\pi)^n}}\mathrm{exp}(-\frac{1}{2}x^\top W x)

Brockett flow

P(n)

W_1

D(n)

\(H_N(W)\) relative entropy functional

Functional \(F(Q) = H_N(Q^\top W_1 Q)\) on \(O(n)\)

\mathrm{Orb}(W_1)

\displaystyle H_{N}(W) = -\frac{1}{2}\mathrm{tr}(N W^{-1}) + \frac{1}{2}\log\det(N W^{-1})

Relative entropy

Heat flow

Wasserstein-Otto metric

\overline{\mathcal{G}}_\mu(\dot\mu,\dot\mu) = \int_M \lvert \nabla \theta\rvert^2 \mu

\dot\rho + \operatorname{div}(\rho \nabla\theta) = 0, \; \rho = \mu/dx

\(\Rightarrow\) Riemannian gradient flow \(\dot\rho = -\nabla_{\overline{\mathcal G}}F(\rho)\)

\dot\rho = \operatorname{div}(\rho \nabla\frac{\delta F}{\delta\rho})

Take \(F(\rho) = \int_M \log(\rho) \rho \Rightarrow \delta F = \log(\rho)+1\)

\Rightarrow \; \dot\rho = \Delta\rho

IPM and Toda

same potential, different Riemannian metrics:

IPM: \(L^2\) on velocity (\(H^{1}\) on stream function)

TODA: \(H^{-1}\) on velocity (\(L^2\) on stream function)

gradients flows on \(\mathrm{Diff}_\mu(S^2)\)

gravity

low density

(light particles)

high density

(heavy particles)

\mathrm{Diff}_\mu(M^2)

\mathrm{Id}

\rho_0

\rho_1

IPM results

Toda results

IPM vs Toda results

Summary

IPM and Toda \(\Rightarrow\) Riemannian gradient flows on \(\mathrm{Diff}_\mu(M)\) (or quantized on \(\mathrm{SO}(n)\))
same potential function
different (right-invariant) Riemannian metrics
IPM: \(L^2\) Toda: \( H^{-1} \)
Stronger metric \(\Rightarrow\) more regular flow
IPM: ODE on \(\mathrm{Diff}_\mu^s(M)\) (for \(s>2\))
Toda: not ODE on \(\mathrm{Diff}_\mu^s(M)\)

Wasserstein-Otto geometry

By Klas Modin

Wasserstein-Otto geometry

Tutorial talk given 2023-11 in Banff.

Klas Modin PRO

Mathematician at Chalmers University of Technology and the University of Gothenburg

klasmodin.github.io

Wasserstein-Otto geometry

Klas Modin

Riemannian principal bundles

Riemannian principal bundles

Riemannian principal bundles

Riemannian principal bundles

Riemannian principal bundles

Riemannian principal bundles

Riemannian principal bundles

Riemannian principal bundles

Riemannian principal bundles

Riemannian principal bundles

Riemannian principal bundles

Optimal mass transport (OMT)

Optimal mass transport (OMT)

Riemannian structure of OMT

Geodesics on \(\operatorname{Diff}(\mathbb{R}^n)\)

Geodesics on \(\operatorname{Diff}(\mathbb{R}^n)\)

Geodesic distance on \(\operatorname{Diff}(\mathbb{R}^n)\)

Monge-Ampere equation on \(\mathbb{R}^n\)

Monge-Ampere equation on \(\mathbb{R}^n\)

Linear optimal mass transport

Bundle structure

Bundle structure

Bundle structure

Bundle structure

Bundle structure

Bundle structure

Bundle structure

Bundle structure

Bundle structure

Bundle structure

Bundle structure

Bundle structure

Bundle structure

Bundle structure

Wasserstein-Otto vs. Fisher-Rao

Brockett flow

Heat flow

IPM and Toda

IPM results

Toda results

IPM vs Toda results

IPM vs Toda results

Summary

Wasserstein-Otto geometry

More from Klas Modin