Wasserstein-Otto geometry

Klas Modin

Riemannian principal bundles

E
E/H\simeq B
\mathrm{Hor}
E
\hookleftarrow H
\downarrow
B
\pi

Invariant Riemannian metric on \(E\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

Riemannian principal bundles

E
E/H\simeq B
\mathrm{Hor}
E
\hookleftarrow H
\downarrow
B
\pi

Invariant Riemannian metric on \(E\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

Riemannian principal bundles

G
G/H\simeq B
e
\mathrm{Hor}
G
\hookleftarrow H
\downarrow
B
\pi

left co-sets \([g] = g\cdot H  \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

Riemannian principal bundles

G
G/G_{b_0}\simeq B
e
\mathrm{Hor}
G
\hookleftarrow G_{b_0}
\downarrow
B
\pi(g) = g\cdot b_0

left co-sets \([g] = g\cdot G_{b_0}  \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

b_0
G_{b_0}

Riemannian principal bundles

G
G/G_{b_0}\simeq B
e
\text{horizontal flow}
G
\hookleftarrow G_{b_0}
\downarrow
B
\pi(g) = g\cdot b_0

left co-sets \([g] = g\cdot G_{b_0}  \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

b_0

Riemannian principal bundles

G
G/G_{b_0}\simeq B
e
\text{vertical flow}
G
\hookleftarrow G_{b_0}
\downarrow
B
\pi(g) = g\cdot b_0

left co-sets \([g] = g\cdot G_{b_0}  \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

b_0

Riemannian principal bundles

G
G/G_{b_0}\simeq B
e
g\in G
G
\hookleftarrow G_{b_0}
\downarrow
B
\pi(g) = g\cdot b_0

left co-sets \([g] = g\cdot G_{b_0}  \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

b_0
b_1 = \pi(g) = g\cdot b_0

Riemannian principal bundles

G
G/G_{b_0}\simeq B
e
\gamma(1)\in G
G
\hookleftarrow G_{b_0}
\downarrow
B
\pi(g) = g\cdot b_0

left co-sets \([g] = g\cdot G_{b_0}  \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

b_0
b_1 = \pi(g) = g\cdot b_0
\gamma'(t)\in \mathrm{Hor}

Riemannian principal bundles

G
G/G_{b_0}\simeq B
e
g = \gamma(1)(\gamma(1)^{-1}g)
G
\hookleftarrow G_{b_0}
\downarrow
B
\pi(g) = g\cdot b_0

left co-sets \([g] = g\cdot G_{b_0}  \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

b_0
b_1 = \pi(g) = g\cdot b_0
\gamma'(t)\in \mathrm{Hor}
\overbrace{\phantom{klasklas}}^{\in G_{b_0}}

Riemannian principal bundles

G
G/G_{b_0}\simeq B
e
g = \gamma(1)h
G
\hookleftarrow G_{b_0}
\downarrow
B
\pi(g) = g\cdot b_0

left co-sets \([g] = g\cdot G_{b_0}  \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

b_0
b_1 = \pi(g) = g\cdot b_0
\gamma'(t)\in \mathrm{Hor}

Riemannian principal bundles

G
G/G_{b_0}\simeq B
e
g = k h
G
\hookleftarrow G_{b_0}
\downarrow
B
\pi(g) = g\cdot b_0

left co-sets \([g] = g\cdot G_{b_0}  \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

b_0
b_1 = \pi(g) = g\cdot b_0
K

polar cone

Optimal mass transport (OMT)

\mu_0
\mu_1
\varphi_*\mu_0
\displaystyle\min_{\varphi*\mu_0=\mu_1} \int_{M} d_M^2(\varphi(x),x ) \mu_0
M

Monge problem, \(L^2\) version

Optimal mass transport (OMT)

\mu_0
\mu_1
\varphi_*\mu_0
\displaystyle\min_{\varphi*\mu_0=\mu_1} \int_{\mathbb{R}^n} \lvert \varphi(x)-x \rvert^2 \mu_0
\mathbb{R}^n

Monge problem, \(L^2\) version

\underbrace{\phantom{klasklkllkklklas}}_{J(\varphi)}

Riemannian structure of OMT

\mathrm{Diff}(M)
\mathrm{Dens}(M)\simeq \mathrm{Diff}(M)/\mathrm{Diff}_{\mu_0}(M)
\mathrm{Id}
\mu_0
\mu_1
\pi(\varphi)=\varphi_*\mu_0

Riemannian metric

\displaystyle\mathcal{G}_\varphi(\dot\varphi,\dot\varphi) = \int_{M}\left\vert \dot\varphi \right\vert^2 \mu_0

Induced metric

\overline{\mathcal{G}}_\mu(\dot\mu,\dot\mu) = \int_M \lvert \nabla \theta\rvert^2 \mu

[Benamou & Brenier (2000), Otto (2001)]

Invariance: \(\eta\in\mathrm{Diff}_{\mu_0}(M)\)

\displaystyle\mathcal{G}_\varphi(\dot\varphi,\dot\varphi) = \mathcal{G}_{\varphi\circ\eta}(\dot\varphi\circ\eta,\dot\varphi\circ\eta)
\dot\rho + \operatorname{div}(\rho \nabla\theta) = 0, \; \rho = \mu/dx

Geodesics on \(\operatorname{Diff}(\mathbb{R}^n)\)

Geodesic equation:

\ddot\varphi = 0 \Rightarrow \varphi(t) = \mathrm{Id} + t\,v_0
\mathrm{Id}
\mu_0
K
\mu_1
\mathrm{Hor}_{\mathrm{Id}} = \nabla C^\infty(\mathbb{R}^n)

Easy to prove:

Polar cone \(K\) is isomorphic to strictly convex smooth functions via \(\phi \mapsto \nabla\phi\)

Hard to prove:

Polar cone \(K\) a section of principal bundle

Geodesics on \(\operatorname{Diff}(\mathbb{R}^n)\)

Geodesic equation:

\ddot\varphi = 0 \Rightarrow \varphi(t) = \mathrm{Id} + t\,v_0
\mathrm{Id}
\mu_0
K
\mu_1
\varphi = \nabla\phi\circ\eta, \; \eta\in \operatorname{Diff}_{\mu_0}(\mathbb{R}^n)

Easy to prove:

Polar cone \(K\) is isomorphic to strictly convex smooth functions via \(\phi \mapsto \nabla\phi\)

Hard to prove:

Polar cone \(K\) a section of principal bundle

Brenier's decomposition of transport maps

Geodesic distance on \(\operatorname{Diff}(\mathbb{R}^n)\)

Geodesic curve:

\varphi(t) = (1-t)\varphi_0 + t \varphi_1
\mathrm{Id}
\mu_0
K
\mu_1
\mathrm{dist}(\varphi_0,\varphi_1)^2 = \int_0^1 \mathcal{G}_{\varphi(t)}(\dot\varphi(t),\dot\varphi(t)) dt
= \int_{\mathbb{R}^n} \lvert \varphi_1(x)-\varphi_0(t) \rvert^2
= \int_0^1\int_{\mathbb{R}^n} \lvert \dot\varphi(t)\rvert^2\mu_0 dt

In particular:

J(\varphi) = \mathrm{dist}(\mathrm{Id},\varphi)^2

Monge-Ampere equation on \(\mathbb{R}^n\)

Geodesic curve:

\varphi(t) = \nabla( |x|^2/2 + t f )
\mathrm{Id}
\mu_0
K
\mu_1
(\nabla\phi)_*\mu_0 = \mu_1

In particular:

J(\varphi) = \mathrm{dist}(\mathrm{Id},\varphi)^2
\underbrace{\phantom{klaklklsklkasl}}_{\nabla\phi}
\displaystyle \Rightarrow \operatorname{det}(\nabla^2\phi) = \frac{\rho_0}{\rho_1\circ \nabla\phi}

Monge-Ampere equation on \(\mathbb{R}^n\)

Geodesic curve:

\varphi(t) = \nabla( |x|^2/2 + t f )
\mathrm{Id}
\mu_0
K
\mu_1
(\nabla\phi)_*\mu_0 = \mu_1

In particular:

J(\varphi) = \mathrm{dist}(\mathrm{Id},\varphi)^2
\underbrace{\phantom{klaklklsklkasl}}_{\nabla\phi}
\displaystyle \Rightarrow \operatorname{det}(\nabla^2\phi) = \frac{\rho_0}{\rho_1\circ \nabla\phi}

Linear optimal mass transport

Trivial observation:   \(\varphi_0(x) = A_0 x\), \(\varphi_1(x) = A_1 x\)   linear diffeomorphisms \(\Rightarrow\) geodesic consists of linear diffeomorphisms

Consequence: \(GL(n)\) is totally geodesic subgroup of \(\operatorname{Diff}(\mathbb{R}^n)\)

Corresponding subspace of densities (statistical submanifold): multivariate Gaussians with zero mean

\displaystyle \rho(x) = \frac{1}{\sqrt{(2\pi)^n\mathrm{det}(\Sigma)}}\mathrm{exp}(-\frac{1}{2}x^\top \Sigma^{-1}x)
\Sigma \in P(n)

Bundle structure

GL(n)
P(n)\simeq GL(n)/O(n,\Sigma_0)
I
\Sigma_0
\Sigma_1
\pi(A)=A\Sigma_0 A^\top
\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)
\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)
\dot\Sigma = S\Sigma + \Sigma S
K

Bundle structure

GL(n)
P(n)\simeq GL(n)/O(n,\Sigma_0)
I
\Sigma_0
\Sigma_1
\pi(A)=A\Sigma_0 A^\top
\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)
\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)
\dot\Sigma = S\Sigma + \Sigma S
K = P(n)

Bundle structure

GL(n)
P(n)\simeq GL(n)/O(n,\Sigma_0)
I
\Sigma_0
\Sigma_1
\pi(P)=P\Sigma_0 P = \Sigma_1
\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)
\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)
\dot\Sigma = S\Sigma + \Sigma S
K = P(n)

Monge-Ampere equation:

P

Bundle structure

GL(n)
P(n)\simeq GL(n)/O(n,\Sigma_0)
I
\Sigma_0
\Sigma_1
A = PQ
\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)
\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)
\dot\Sigma = S\Sigma + \Sigma S
K = P(n)

Factorization theorem:

A
P

Bundle structure

GL(n)
P(n)\simeq GL(n)/O(n,\Sigma_0)
I
\Sigma_0
\Sigma_1
\dot B = -\mathrm{Pr}\nabla_{\mathcal G}J(B), \; B(0) = A
\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)
\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)
\dot\Sigma = S\Sigma + \Sigma S
K = P(n)

Vertical gradient flow:

A

Bundle structure

GL(n)
P(n)\simeq GL(n)/O(n,\Sigma_0)
I
\Sigma_0
\Sigma_1
\dot B = \Omega B, \; B(0) = A
\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)
\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)
\dot\Sigma = S\Sigma + \Sigma S
K = P(n)

Vertical gradient flow:

A
\Sigma_1 \Omega + \Omega\Sigma_1 = 2\Sigma_1 (B^{-1}-B^{-\top})

Bundle structure

GL(n)
P(n)
I
\Sigma_0
\Sigma_1
\dot B = \Omega B, \; B(0) = A
\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)
\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)
\dot\Sigma = S\Sigma + \Sigma S
K = P(n)

Vertical gradient flow:

A
\Sigma_1 \Omega + \Omega\Sigma_1 = 2\Sigma_1 (B^{-1}-B^{-\top})

Bundle structure

GL(n)
P(n)
I
\Sigma_0
\Sigma_1
\dot \Sigma = \mathrm{Pr}\nabla_{\bar{\mathcal G}}H_{\Sigma_1}(\Sigma)
\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)
\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)
\dot\Sigma = S\Sigma + \Sigma S
K = P(n)

Horizontal gradient (heat) flow:

\Sigma(t)
\displaystyle H_{\Sigma_1}(\Sigma) = -\frac{1}{2}\mathrm{tr}(\Sigma_1^{-1}\Sigma) + \frac{1}{2}\log\det(\Sigma_1^{-1}\Sigma)

Relative entropy

(Kullback-Leibler)

Bundle structure

GL(n)
P(n)
I
\Sigma_0
\Sigma_1
\dot \Sigma = 2I - \Sigma_1^{-1}\Sigma - \Sigma\Sigma_1^{-1}
\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)
\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)
\dot\Sigma = S\Sigma + \Sigma S
K = P(n)

Horizontal gradient (heat) flow:

\Sigma(t)
\displaystyle H_{\Sigma_1}(\Sigma) = -\frac{1}{2}\mathrm{tr}(\Sigma_1^{-1}\Sigma) + \frac{1}{2}\log\det(\Sigma_1^{-1}\Sigma)

Relative entropy

(Kullback-Leibler)

Bundle structure

GL(n)
P(n)
I
\Sigma_0
\Sigma_1
\dot P = P^{-1}\Sigma_0^{-1} - \Sigma_1^{-1}P + V
\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)
\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)
\dot\Sigma = S\Sigma + \Sigma S
K = P(n)

Horizontal gradient (heat) flow:

P(t)
\displaystyle F(P) = H_{\Sigma_1}(P\Sigma_0 P)

Lifted gradient flow on \(K\) for

Bundle structure

GL(n)
P(n)
I
\Sigma_0
\Sigma_1
\dot P = P^{-1}\Sigma_0^{-1} - \Sigma_1^{-1}P + V
\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)
\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)
\dot\Sigma = S\Sigma + \Sigma S
K = P(n)

Horizontal gradient (heat) flow:

P(t)
\displaystyle F(P) = H_{\Sigma_1}(P\Sigma_0 P)

Lifted gradient flow on \(K\) for

Hessian of \(F(P)\) strictly positive on \(K\) \(\Rightarrow\) unique limit!

Bundle structure

GL(n)
P(n)
I
\Sigma_0
\Sigma_1
\dot P = P^{-1}\Sigma_0^{-1} - \Sigma_1^{-1}P + V
\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)
\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)
\dot\Sigma = S\Sigma + \Sigma S
K = P(n)

Horizontal gradient (heat) flow:

P(t)
\displaystyle F(P) = H_{\Sigma_1}(P\Sigma_0 P)

Lifted gradient flow on \(K\) for

Hessian of \(F(P)\) strictly positive on \(K\) \(\Rightarrow\) unique limit!

Bundle structure

GL(n)
P(n)
I
\Sigma_0
\Sigma_1
\dot P = P^{-1}\Sigma_0^{-1} - \Sigma_1^{-1}P + V
\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)
\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)
\dot\Sigma = S\Sigma + \Sigma S
K = P(n)

Horizontal gradient (heat) flow:

P(t)
\displaystyle F(P) = H_{\Sigma_1}(P\Sigma_0 P)

Lifted gradient flow on \(K\) for

Hessian of \(F(P)\) strictly positive on \(K\) \(\Rightarrow\) unique limit!

Bundle structure

GL(n)
P(n)
I
\Sigma_0
\Sigma_1
\dot P = P^{-1}\Sigma_0^{-1} - \Sigma_1^{-1}P + V
\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)
\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)
\dot\Sigma = S\Sigma + \Sigma S
K = P(n)

Horizontal gradient (heat) flow:

P(t)
\displaystyle F(P) = H_{\Sigma_1}(P\Sigma_0 P)

Lifted gradient flow on \(K\) for

Hessian of \(F(P)\) strictly positive on \(K\) \(\Rightarrow\) unique limit!

Wasserstein-Otto vs. Fisher-Rao

\mathrm{Dens}(M)
T_\mu\mathrm{Dens}(M)\simeq C^\infty_0(M)

Wasserstein

Fisher-Rao

\displaystyle\overline{\mathcal{G}}_\mu(\dot\mu,\dot\mu) = \int_{M} \frac{\dot\mu}{\mu}\frac{\dot\mu}{\mu}\mu
\displaystyle\overline{\mathcal{G}}_{\rho dx}(\dot\rho dx,\dot\rho dx) = \int_{M} |\nabla\theta|^2\rho
\displaystyle \dot\rho + \mathrm{div}(\rho \nabla\theta) = 0

Dependent on Riemannian structure of \(M\)

Independent of Riemannian structure of \(M \Rightarrow \mathrm{Diff}(M)\)-invariance

\displaystyle \rho(x) = \frac{1}{\sqrt{(2\pi)^n\mathrm{det}(\Sigma)}}\mathrm{exp}(-\frac{1}{2}x^\top \Sigma^{-1}x)
\displaystyle \rho(x) = \sqrt{\frac{\mathrm{det}(W)}{(2\pi)^n}}\mathrm{exp}(-\frac{1}{2}x^\top W x)

Brockett flow

P(n)
N
W_1
D(n)

\(H_N(W)\) relative entropy functional

 

 

Functional \(F(Q) = H_N(Q^\top W_1 Q)\) on \(O(n)\)

\mathrm{Orb}(W_1)
\displaystyle H_{N}(W) = -\frac{1}{2}\mathrm{tr}(N W^{-1}) + \frac{1}{2}\log\det(N W^{-1})

Relative entropy

Heat flow

Wasserstein-Otto metric

\overline{\mathcal{G}}_\mu(\dot\mu,\dot\mu) = \int_M \lvert \nabla \theta\rvert^2 \mu
\dot\rho + \operatorname{div}(\rho \nabla\theta) = 0, \; \rho = \mu/dx

\(\Rightarrow\) Riemannian gradient flow \(\dot\rho = -\nabla_{\overline{\mathcal G}}F(\rho)\)

\dot\rho = \operatorname{div}(\rho \nabla\frac{\delta F}{\delta\rho})

Take \(F(\rho) = \int_M \log(\rho) \rho \Rightarrow \delta F = \log(\rho)+1\)

\Rightarrow \; \dot\rho = \Delta\rho

IPM and Toda

same potential, different Riemannian metrics:

IPM: \(L^2\) on velocity (\(H^{1}\) on stream function)

TODA: \(H^{-1}\) on velocity (\(L^2\) on stream function)

gradients flows on \(\mathrm{Diff}_\mu(S^2)\)

gravity

low density

(light particles)

high density

(heavy particles)

\mathrm{Diff}_\mu(M^2)
\mathrm{Id}
\rho_0
\rho_1

IPM results

Toda results

IPM vs Toda results

IPM vs Toda results

Summary

  • IPM and Toda \(\Rightarrow\) Riemannian gradient flows on \(\mathrm{Diff}_\mu(M)\) (or quantized on \(\mathrm{SO}(n)\))
  • same potential function
  • different (right-invariant) Riemannian metrics
    IPM: \(L^2\)         Toda: \( H^{-1} \)
  • Stronger metric \(\Rightarrow\) more regular flow
    IPM: ODE on \(\mathrm{Diff}_\mu^s(M)\) (for \(s>2\))
    Toda: not ODE on \(\mathrm{Diff}_\mu^s(M)\)

Wasserstein-Otto geometry

By Klas Modin

Wasserstein-Otto geometry

Tutorial talk given 2023-11 in Banff.

  • 291