The geometry of optimal transport

\mu_0
\mu_1

Théo Dumont

\operatorname{Dens}(\mathbb R^n)

Optimal Transport

Gaspard Monge
1746-1818

Leonid Kantorovitch
1912-1986

Yann Brenier
1957-

\displaystyle \text{OT}(\mu_0,\mu_1)= \inf_{\varphi} \int_{\mathbb R^n} \|x-\varphi(x)\|^2\,\mathrm d\mu_0\qquad\text{over }\varphi:\mathbb R^n\to\mathbb R^n\text{ such that } \varphi_*\mu_0=\mu_1.
\mu_0,\mu_1\in\mathcal P(\mathbb R^n)

Optimal Transport

\mu_0
\mu_1

OT problem (Monge)

\displaystyle \text{OT}_K(\mu_0,\mu_1)= \inf_{\pi} \int_{\mathbb R^n\times\mathbb R^n} \|x-y\|^2\,\mathrm d\pi\quad\text{over }\pi\in\mathcal P(\mathbb R^n\times \mathbb R^n)\text{ such that } P^i_*\pi=\mu_i.

OT problem (Kantorovitch)

\delta_x
\delta_{y_1}
\delta_{y_2}

not feasible by a map!

\(\pi\) is induced by a transport map \(\varphi\)

\(\pi\) is a transport plan

relaxation

\mu_0
\mu_1
\pi
\mu_0
\mu_1
\pi

\(|\!\det d\varphi^{-1}|\mu_0 \circ \varphi^{-1}\)

[Monge, 1781], [Kantorovitch, 1942]

Can we say that the solution of (K) is a map?

?

Brenier's theorem

If \(\mu_0\) has a density, then there is a unique solution to (K), and it is of the form \(\varphi=\nabla f\) with \(f:\mathbb R^n\to\mathbb R\) convex.

relaxation

\(\pi\) is induced by a transport map \(\varphi\)

\(\pi\) is a transport plan

Monge (maps)

Kantorovitch (plans)

\mu_0
\mu_1
\pi
\mu_0
\mu_1
\pi

[Brenier, 1987]

Optimal Transport

\operatorname{Dens}(\mathbb R^n)=\{\mu(x)\mathrm dx\mid\mu\in C^{\infty}(\mathbb R^n),\int\mu=1\}
\mu_0,\mu_1\in\operatorname{Dens}(\mathbb R^n)

Smooth Optimal Transport

\mu_0
\mu_1

Smooth densities:

can we recover classical results of OT theory with a geometric picture?

?

\coloneqq\mathcal C(\mu_0,\mu_1)

Smooth OT problem

\displaystyle \text{OT}(\mu_0,\mu_1)= \min_{\varphi} \int_{\mathbb R^n} \|x-\varphi(x)\|^2\,\mathrm d\mu_0\qquad\text{over }\varphi\in\operatorname{Diff}(\mathbb R^n)\text{ such that } \varphi_*\mu_0=\mu_1
\mathcal C(\mu_0,\mu_1)

Diffeomorphism group

not right-invariant! only by action of \(\operatorname{Diff}_{\mu_0}(\mathbb R^n)\)

!

\operatorname{Diff}(\mathbb R^n)
\varphi
\operatorname{id}
  • Geodesic equation:
\begin{cases} \ddot\varphi_t=0\\ \varphi_t=(1-t)\varphi_0+t\varphi_1 \end{cases}
\begin{cases} \dot\varphi=v\circ\varphi\\ \dot v+\nabla_v v=0 \end{cases}

inviscid Burgers

\displaystyle d^2(\varphi_0,\varphi_1)=\int_0^1g_{\varphi_t}(\dot\varphi_t,\dot\varphi_t)\,\mathrm dt
\displaystyle =\int_0^1\int_{\mathbb R^d}\|\varphi_1-\varphi_0\|^2\,\mathrm d\mu_0\,\mathrm dt
\displaystyle =\int_{\mathbb R^d}\|\varphi_1-\varphi_0\|^2\,\mathrm d\mu_0

finding the shortest geodesic from \(\operatorname{id}\) to \(\mathcal C(\mu_0,\mu_1)\)

\(\displaystyle \text{OT}(\mu_0,\mu_1)=\min_{\varphi\,\in \,\mathcal C(\mu_0,\mu_1)} d^2(\operatorname{id},\varphi)\)

A first link with OT

horizontal?

T_\varphi\operatorname{Diff}(\mathbb R^n)=\{v\circ\varphi\mid v\in \mathfrak X(\mathbb R^n)\}
\dot\varphi
  • Tangent space at \(\varphi\):
\displaystyle G_\varphi(\dot\varphi,\dot\varphi)\coloneqq\int_{\mathbb R^d}\|\dot\varphi\|^2\,\mathrm d\mu_0 =\int_{\mathbb R^d}\|v\|^2\,\mathrm d\mu
  • Metric at \(\varphi\):
\varphi_*\mu_0

[Otto, 2001], [Kriegl & Michor, 1997], [Modin, 2015]

\operatorname{id}
\varphi
\operatorname{Diff}_{\mu_0}(\mathbb R^n)
\mathcal C(\mu_0,\mu_1)

A submersion

\operatorname{Diff}(\mathbb R^n)

\(\pi: \operatorname{Diff}(\mathbb R^n)\longrightarrow\operatorname{Dens}(\mathbb R^n)\)
                  \(\varphi\longmapsto\varphi_*\mu_0\)

\mu_0
\mu_1
\pi:\varphi\mapsto\varphi_*\mu_0
\operatorname{Dens}(\mathbb R^n)

Fiber over \(\mu_1\):

    \(\{\varphi\mid \varphi_*\mu_0=\mu_1\}=\mathcal C(\mu_0,\mu_1)\)

Fiber over \(\mu_0\):

    \(\{\varphi\mid \varphi_*\mu_0=\mu_0\}=\operatorname{Diff}_{\mu_0}(\mathbb R^n)\)

\mu_0\in\operatorname{Dens}(\mathbb R^n)

\(d\pi(\varphi): T_\varphi\operatorname{Diff}(\mathbb R^n)\longrightarrow T_{\mu}\operatorname{Dens}(\mathbb R^n)\)
                         \(v\circ\varphi\longmapsto -\operatorname{div}(\mu v)\)

\varphi_*\mu_0
\operatorname{id}
\varphi
\mu_0
\mu_1
\operatorname{Diff}_{\mu_0}(\mathbb R^n)
\pi:\varphi\mapsto\varphi_*\mu_0

\(\pi:\varphi\mapsto\varphi_*\mu_0\) is a smooth submersion.

\mathcal C(\mu_0,\mu_1)
\nabla p

A submersion

\operatorname{Diff}(\mathbb R^n)
\operatorname{Dens}(\mathbb R^n)

vertical distribution:

\operatorname{Ver}_\varphi\coloneqq\ker d\pi(\varphi)\\\hspace{2.8cm}=\{v\circ\varphi\mid \operatorname{div}(\mu v)=0\}
\varphi_*\mu_0

horizontal distribution:

\operatorname{Hor}_\varphi\coloneqq (\operatorname{Ver}_\varphi)^{\perp_G}\\\hspace{3.2cm}=\{\nabla p\circ\varphi\mid p\in C^\infty(\mathbb R^n)\}

Any \(u\in\mathfrak X(\mathbb R^n)\) can be written as
                    \(u=v+\nabla p\)
with \(\operatorname{div}(\mu_0 v)=0\) and \(p\in C^{\infty}(\mathbb R^n)\).

Helmholtz/Hodge decomposition

T_ {\varphi}\operatorname{Diff}(\mathbb R^n)=\operatorname{Ver}_{\varphi}\overset{\perp}\oplus\operatorname{Hor}_{\varphi}

right-invariance under action of fiber \(\operatorname{Diff}_{\mu_0}(\mathbb R^n)\)

\implies

\(\pi\) induces a metric on \(\operatorname{Diff}(\mathbb R^n)/\operatorname{Diff}_{\mu_0}(\mathbb R^n)\simeq\operatorname{Dens}(\mathbb R^n)\)

independent
of \(G\)

depends
on \(G\)!

\operatorname{id}
\varphi
\mu_0
\mu_1
\pi:\varphi\mapsto\varphi_*\mu_0
\mathcal C(\mu_0,\mu_1)
\dot\mu=-\operatorname{div}(\mu\nabla p)
\operatorname{Diff}(\mathbb R^n)
\operatorname{Dens}(\mathbb R^n)

A Riemannian submersion

\nabla p
\dot\mu

\(\pi\) is a Riemannian submersion

\displaystyle g_\mu(\dot\mu,\dot\mu)\coloneqq \inf_{d\pi(\varphi).\dot\varphi=\dot\mu} G_\varphi(\dot\varphi,\dot\varphi)

Pythagoras \(\implies\dot\varphi\) has to be horizontal!

\displaystyle g_\mu(\dot\mu,\dot\mu)=\int_{\mathbb R^n}\|\nabla p\|^2\,\mathrm d\mu

where \(\dot\mu\) and \(\nabla p\) are linked by \(\dot\mu=-\operatorname{div}(\mu\nabla p)\).

Metric on \(\operatorname{Dens}(\mathbb R^n)\):

\operatorname{id}
\varphi_1
\mu_0
\mu_1
\pi
\mathcal C(\mu_0,\mu_1)
\nabla p
\dot\mu=-\operatorname{div}(\mu\nabla p)
\operatorname{Diff}(\mathbb R^n)
\operatorname{Dens}(\mathbb R^n)

what do the geodesics look like in \(\operatorname{Dens}(\mathbb R^n)\)?

?

\dot\mu

geodesics in \(\operatorname{Dens}(\mathbb R^n)\)

horizontal geodesics in \(\operatorname{Diff}(\mathbb R^n)\)

\(\iff\)

\varphi_t=\operatorname{id}+t\nabla p
\begin{cases} \dot\mu+\operatorname{div}(\mu\nabla p)=0\\ \dot p+\frac12 \|\nabla p\|^2=0 \end{cases}

(Hamilton-Jacobi)

(continuity equation)

The induced geodesic distance on \(\operatorname{Dens}(\mathbb R^n)\) is the OT distance!

(see previous computation \(d^2(\operatorname{id},\varphi)=\text{OT}(\mu_0,\mu_1)\))

at time \(t=1\):    \(\varphi_1=\nabla (\frac12 \|x\|^2+p)=\nabla f\)

what's the final map?

?

Brenier's theorem

A Riemannian submersion

\operatorname{id}
\varphi
\mu_0
\mu_1
\pi
\mathcal C(\mu_0,\mu_1)
v_t
\operatorname{Diff}(\mathbb R^n)
\operatorname{Dens}(\mathbb R^n)

Benamou-Brenier

\dot\mu_t
\text{OT}(\mu_0,\mu_1)=\inf_{v_t,\,\mu_t}\int_0^1 \|v_t\|^2_{L^2(\mu_t)}\,\mathrm dt

Benamou-Brenier/dynamic formulation of OT

\(\implies\) this is just finding the curve of minimal energy between \(\mu_0\) and \(\mu_1\) in \(\operatorname{Dens}(\mathbb R^n)\),

i.e. finding a (horizontal) geodesic!

where \(\dot\mu_t+\operatorname{div}(\mu_t v_t)=0\), and where \(\mu_t\) has the right endpoints.

[Benamou & Brenier, 2000]

Usefulness:

  • computational: scale to bigger number of points
  • theoretical: propose extensions of the OT framework (unbalanced OT, ...)
\operatorname{id}
\nabla f
\operatorname{Diff}(\mathbb R^n)

Polar factorization

Polar factorization

Any \(\varphi\in\operatorname{Diff}(\mathbb R^n)\) can be written as
                    \(\varphi=\nabla f\circ\phi\)
with \(f\in C^{\infty}(\mathbb R^n)\) and \(\phi\in\operatorname{Diff}_{\mu_0}(\mathbb R^n)\).

\phi
\varphi
\operatorname{Diff}_{\mu_0}(\mathbb R^n)

[Brenier, 1987]

\mu
\pi
-\nabla \frac{\delta F}{\delta\mu}
\operatorname{Diff}(\mathbb R^n)
\operatorname{Dens}(\mathbb R^n)
\dot\mu

Gradient flows

\dot\mu=\operatorname{div}\Big(\mu\nabla\frac{\delta F}{\delta\mu}(\mu)\Big).

Gradient flow

The gradient flow of \(\mu\) w.r.t. a functional \(F\) is

``\dot x=-\operatorname{grad}^g F(x)"

Fréchet derivative

F(\mu)=\int_{\mathbb R^n}\mu(\log\mu-1)+\int_{\mathbb R^n}V\mu

Example:

entropy

potential

\frac{\delta F}{\delta \mu}(\mu)=\log\mu+V
\dot\mu=\Delta\mu+\operatorname{div}(\mu\nabla V)

Fokker-Planck equation

+ heat flow

\dot\mu=-\operatorname{div}(\mu\nabla p)
  • converges to stationary distribution \(\mu_\infty=\frac1{\int e^{-V}}e^{-V}\)
  • + \(\lambda\)-convexity of \(F\) along geodesics implies exponential convergence in terms of KL divergence

OT = finding the shortest geodesic from \(\operatorname{id}\) to constraint set \(\mathcal C(\mu_0,\mu_1)\)

horizontal

We recover:

  • polar factorization
  • Hodge decomposition
\operatorname{id}
\varphi_1
\mu_0
\mu_1
\pi:\varphi\mapsto\varphi_*\mu_0
\mathcal C(\mu_0,\mu_1)
\nabla p
\dot\mu=-\operatorname{div}(\mu\nabla p)
\operatorname{Diff}(\mathbb R^n)
\operatorname{Dens}(\mathbb R^n)
\dot\mu

Recap

cf. Benamou-Brenier formulation

Riemannian submersion \((\operatorname{Diff}(\mathbb R^n),L^2(\mu_0))\overset{\pi}{\longrightarrow}(\operatorname{Dens}(\mathbb R^n), \text{OT})\)

Geodesic equation on... is...
inviscid Burgers
incompressible Euler
Hamilton-Jacobi + contin. eqn.
\operatorname{Diff}(\mathbb R^n)
\operatorname{Diff}_{\mu_0}(\mathbb R^n)
\operatorname{Dens}(\mathbb R^n)
Gradient flow of... is...
entropy heat flow
entropy + potential Fokker-Planck
loss functional L training inf. wide NN

[Chizat & Bach, 2018]

(local)

OT Inform. theory Unbalanced OT LDDMM Metamorphoses
top space
metric
right-invariant?
bottom space anything with an action
anything with an action
action ... ...
metric Wasserstein Fisher-Rao Wasserstein-Fisher-Rao ​​​​​​​induced metric induced metric
\operatorname{Diff}
\operatorname{Diff}
\operatorname{Diff}\ltimes C^{\infty}
\operatorname{Diff}\ltimes C^{\infty}
\operatorname{Diff}
\operatorname{Dens}
\operatorname{Dens}
\operatorname{Dens}^+
L^2(\mu_0)
H^1(\mu_0)
L^2(\mu_0)
L^2(\mathrm dx,K)
L^2(\mathrm dx,K)

Some other submersions

[Bauer, Bruveris & Michor, 2016], [Gallouët & Vialard, 2018], [Younes, 2010], [Trouvé & Younes, 2005], [Modin, 2015]

finite-dimensional equivalents when restricting to Gaussian measures:
submersion \(\operatorname{GL}(n)\to \operatorname{PSD}(n)\),
induces Bures-Wasserstein and Fisher-Rao

\varphi\mapsto \varphi_*\mu
\varphi\mapsto \varphi^*\mu
(\varphi,\lambda)\mapsto \varphi^*(\lambda^2\mu)

L. Ambrosio, N. Gigli, and G. Savaré. Gradient flows: in metric spaces and in the space of probability measures, 2005.

J.-D. Benamou and Y. Brenier. A computational fluid mechanics solution to the Monge-Kantorovich mass transfer problem, 2000.

M. Bauer, M. Bruveris, and P.W. Michor. Uniqueness of the Fisher–Rao metric on the space of smooth densities, 2016.

Y. Brenier. Décomposition polaire et réarrangement monotone des champs de vecteurs, 1987.

L. Chizat and F. Bach. On the global convergence of gradient descent for over-parameterized models using optimal transport, 2018.

W. Gangbo, H.K. Kim, and T. Pacini. Differential forms on Wasserstein space and infinite-dimensional Hamiltonian systems, 2010.

T. Gallouët and F.-X. Vialard. The Camassa–Holm equation as an incompressible Euler equation: A geometric point of view, 2018.

A. Kriegl and P.W. Michor. The convenient setting of global analysis, 1997.

K. Modin. Geometry of matrix decompositions seen through optimal transport and information geometry, 2016.

F. Otto. The geometry of dissipative evolution equations: the porous medium equation, 2001.

A. Trouvé and L. Younes. Metamorphoses through lie group action, 2005.

L. Younes. Shapes and diffeomorphisms, 2010.

References

A geometric view of optimal transport

By Théo Dumont

A geometric view of optimal transport

Talk about the infinite-dimensional Riemannian geometry of Optimal Transport for the shape analysis seminar (https://shape-analysis.github.io/) at the MAP5 lab.

  • 57