The geometry of optimal transport


Théo Dumont

\operatorname{Dens}(\mathbb R^n)

Optimal Transport

Gaspard Monge

Leonid Kantorovitch

Yann Brenier

\displaystyle \text{OT}(\mu_0,\mu_1)= \inf_{\varphi} \int_{\mathbb R^n} \|x-\varphi(x)\|^2\,\mathrm d\mu_0\qquad\text{over }\varphi:\mathbb R^n\to\mathbb R^n\text{ such that } \varphi_*\mu_0=\mu_1.
\mu_0,\mu_1\in\mathcal P(\mathbb R^n)

Optimal Transport


OT problem (Monge)

\displaystyle \text{OT}_K(\mu_0,\mu_1)= \inf_{\pi} \int_{\mathbb R^n\times\mathbb R^n} \|x-y\|^2\,\mathrm d\pi\quad\text{over }\pi\in\mathcal P(\mathbb R^n\times \mathbb R^n)\text{ such that } P^i_*\pi=\mu_i.

OT problem (Kantorovitch)


not feasible by a map!

\(\pi\) is induced by a transport map \(\varphi\)

\(\pi\) is a transport plan



\(|\!\det d\varphi^{-1}|\mu_0 \circ \varphi^{-1}\)

[Monge, 1781], [Kantorovitch, 1942]

Can we say that the solution of (K) is a map?


Brenier's theorem

If \(\mu_0\) has a density, then there is a unique solution to (K), and it is of the form \(\varphi=\nabla f\) with \(f:\mathbb R^n\to\mathbb R\) convex.


\(\pi\) is induced by a transport map \(\varphi\)

\(\pi\) is a transport plan

Monge (maps)

Kantorovitch (plans)


[Brenier, 1987]

Optimal Transport

\operatorname{Dens}(\mathbb R^n)=\{\mu(x)\mathrm dx\mid\mu\in C^{\infty}(\mathbb R^n),\int\mu=1\}
\mu_0,\mu_1\in\operatorname{Dens}(\mathbb R^n)

Smooth Optimal Transport


Smooth densities:

can we recover classical results of OT theory with a geometric picture?


\coloneqq\mathcal C(\mu_0,\mu_1)

Smooth OT problem

\displaystyle \text{OT}(\mu_0,\mu_1)= \min_{\varphi} \int_{\mathbb R^n} \|x-\varphi(x)\|^2\,\mathrm d\mu_0\qquad\text{over }\varphi\in\operatorname{Diff}(\mathbb R^n)\text{ such that } \varphi_*\mu_0=\mu_1
\mathcal C(\mu_0,\mu_1)

Diffeomorphism group

not right-invariant! only by action of \(\operatorname{Diff}_{\mu_0}(\mathbb R^n)\)


\operatorname{Diff}(\mathbb R^n)
  • Geodesic equation:
\begin{cases} \ddot\varphi_t=0\\ \varphi_t=(1-t)\varphi_0+t\varphi_1 \end{cases}
\begin{cases} \dot\varphi=v\circ\varphi\\ \dot v+\nabla_v v=0 \end{cases}

inviscid Burgers

\displaystyle d^2(\varphi_0,\varphi_1)=\int_0^1g_{\varphi_t}(\dot\varphi_t,\dot\varphi_t)\,\mathrm dt
\displaystyle =\int_0^1\int_{\mathbb R^d}\|\varphi_1-\varphi_0\|^2\,\mathrm d\mu_0\,\mathrm dt
\displaystyle =\int_{\mathbb R^d}\|\varphi_1-\varphi_0\|^2\,\mathrm d\mu_0

finding the shortest geodesic from \(\operatorname{id}\) to \(\mathcal C(\mu_0,\mu_1)\)

\(\displaystyle \text{OT}(\mu_0,\mu_1)=\min_{\varphi\,\in \,\mathcal C(\mu_0,\mu_1)} d^2(\operatorname{id},\varphi)\)

A first link with OT


T_\varphi\operatorname{Diff}(\mathbb R^n)=\{v\circ\varphi\mid v\in \mathfrak X(\mathbb R^n)\}
  • Tangent space at \(\varphi\):
\displaystyle G_\varphi(\dot\varphi,\dot\varphi)\coloneqq\int_{\mathbb R^d}\|\dot\varphi\|^2\,\mathrm d\mu_0 =\int_{\mathbb R^d}\|v\|^2\,\mathrm d\mu
  • Metric at \(\varphi\):

[Otto, 2001], [Kriegl & Michor, 1997], [Modin, 2015]

\operatorname{Diff}_{\mu_0}(\mathbb R^n)
\mathcal C(\mu_0,\mu_1)

A submersion

\operatorname{Diff}(\mathbb R^n)

\(\pi: \operatorname{Diff}(\mathbb R^n)\longrightarrow\operatorname{Dens}(\mathbb R^n)\)

\operatorname{Dens}(\mathbb R^n)

Fiber over \(\mu_1\):

    \(\{\varphi\mid \varphi_*\mu_0=\mu_1\}=\mathcal C(\mu_0,\mu_1)\)

Fiber over \(\mu_0\):

    \(\{\varphi\mid \varphi_*\mu_0=\mu_0\}=\operatorname{Diff}_{\mu_0}(\mathbb R^n)\)

\mu_0\in\operatorname{Dens}(\mathbb R^n)

\(d\pi(\varphi): T_\varphi\operatorname{Diff}(\mathbb R^n)\longrightarrow T_{\mu}\operatorname{Dens}(\mathbb R^n)\)
                         \(v\circ\varphi\longmapsto -\operatorname{div}(\mu v)\)

\operatorname{Diff}_{\mu_0}(\mathbb R^n)

\(\pi:\varphi\mapsto\varphi_*\mu_0\) is a smooth submersion.

\mathcal C(\mu_0,\mu_1)
\nabla p

A submersion

\operatorname{Diff}(\mathbb R^n)
\operatorname{Dens}(\mathbb R^n)

vertical distribution:

\operatorname{Ver}_\varphi\coloneqq\ker d\pi(\varphi)\\\hspace{2.8cm}=\{v\circ\varphi\mid \operatorname{div}(\mu v)=0\}

horizontal distribution:

\operatorname{Hor}_\varphi\coloneqq (\operatorname{Ver}_\varphi)^{\perp_G}\\\hspace{3.2cm}=\{\nabla p\circ\varphi\mid p\in C^\infty(\mathbb R^n)\}

Any \(u\in\mathfrak X(\mathbb R^n)\) can be written as
                    \(u=v+\nabla p\)
with \(\operatorname{div}(\mu_0 v)=0\) and \(p\in C^{\infty}(\mathbb R^n)\).

Helmholtz/Hodge decomposition

T_ {\varphi}\operatorname{Diff}(\mathbb R^n)=\operatorname{Ver}_{\varphi}\overset{\perp}\oplus\operatorname{Hor}_{\varphi}

right-invariance under action of fiber \(\operatorname{Diff}_{\mu_0}(\mathbb R^n)\)


\(\pi\) induces a metric on \(\operatorname{Diff}(\mathbb R^n)/\operatorname{Diff}_{\mu_0}(\mathbb R^n)\simeq\operatorname{Dens}(\mathbb R^n)\)

of \(G\)

on \(G\)!

\mathcal C(\mu_0,\mu_1)
\dot\mu=-\operatorname{div}(\mu\nabla p)
\operatorname{Diff}(\mathbb R^n)
\operatorname{Dens}(\mathbb R^n)

A Riemannian submersion

\nabla p

\(\pi\) is a Riemannian submersion

\displaystyle g_\mu(\dot\mu,\dot\mu)\coloneqq \inf_{d\pi(\varphi).\dot\varphi=\dot\mu} G_\varphi(\dot\varphi,\dot\varphi)

Pythagoras \(\implies\dot\varphi\) has to be horizontal!

\displaystyle g_\mu(\dot\mu,\dot\mu)=\int_{\mathbb R^n}\|\nabla p\|^2\,\mathrm d\mu

where \(\dot\mu\) and \(\nabla p\) are linked by \(\dot\mu=-\operatorname{div}(\mu\nabla p)\).

Metric on \(\operatorname{Dens}(\mathbb R^n)\):

\mathcal C(\mu_0,\mu_1)
\nabla p
\dot\mu=-\operatorname{div}(\mu\nabla p)
\operatorname{Diff}(\mathbb R^n)
\operatorname{Dens}(\mathbb R^n)

what do the geodesics look like in \(\operatorname{Dens}(\mathbb R^n)\)?



geodesics in \(\operatorname{Dens}(\mathbb R^n)\)

horizontal geodesics in \(\operatorname{Diff}(\mathbb R^n)\)


\varphi_t=\operatorname{id}+t\nabla p
\begin{cases} \dot\mu+\operatorname{div}(\mu\nabla p)=0\\ \dot p+\frac12 \|\nabla p\|^2=0 \end{cases}


(continuity equation)

The induced geodesic distance on \(\operatorname{Dens}(\mathbb R^n)\) is the OT distance!

(see previous computation \(d^2(\operatorname{id},\varphi)=\text{OT}(\mu_0,\mu_1)\))

at time \(t=1\):    \(\varphi_1=\nabla (\frac12 \|x\|^2+p)=\nabla f\)

what's the final map?


Brenier's theorem

A Riemannian submersion

\mathcal C(\mu_0,\mu_1)
\operatorname{Diff}(\mathbb R^n)
\operatorname{Dens}(\mathbb R^n)


\text{OT}(\mu_0,\mu_1)=\inf_{v_t,\,\mu_t}\int_0^1 \|v_t\|^2_{L^2(\mu_t)}\,\mathrm dt

Benamou-Brenier/dynamic formulation of OT

\(\implies\) this is just finding the curve of minimal energy between \(\mu_0\) and \(\mu_1\) in \(\operatorname{Dens}(\mathbb R^n)\),

i.e. finding a (horizontal) geodesic!

where \(\dot\mu_t+\operatorname{div}(\mu_t v_t)=0\), and where \(\mu_t\) has the right endpoints.

[Benamou & Brenier, 2000]


  • computational: scale to bigger number of points
  • theoretical: propose extensions of the OT framework (unbalanced OT, ...)
\nabla f
\operatorname{Diff}(\mathbb R^n)

Polar factorization

Polar factorization

Any \(\varphi\in\operatorname{Diff}(\mathbb R^n)\) can be written as
                    \(\varphi=\nabla f\circ\phi\)
with \(f\in C^{\infty}(\mathbb R^n)\) and \(\phi\in\operatorname{Diff}_{\mu_0}(\mathbb R^n)\).

\operatorname{Diff}_{\mu_0}(\mathbb R^n)

[Brenier, 1987]

-\nabla \frac{\delta F}{\delta\mu}
\operatorname{Diff}(\mathbb R^n)
\operatorname{Dens}(\mathbb R^n)

Gradient flows

\dot\mu=\operatorname{div}\Big(\mu\nabla\frac{\delta F}{\delta\mu}(\mu)\Big).

Gradient flow

The gradient flow of \(\mu\) w.r.t. a functional \(F\) is

``\dot x=-\operatorname{grad}^g F(x)"

Fréchet derivative

F(\mu)=\int_{\mathbb R^n}\mu(\log\mu-1)+\int_{\mathbb R^n}V\mu




\frac{\delta F}{\delta \mu}(\mu)=\log\mu+V
\dot\mu=\Delta\mu+\operatorname{div}(\mu\nabla V)

Fokker-Planck equation

+ heat flow

\dot\mu=-\operatorname{div}(\mu\nabla p)
  • converges to stationary distribution \(\mu_\infty=\frac1{\int e^{-V}}e^{-V}\)
  • + \(\lambda\)-convexity of \(F\) along geodesics implies exponential convergence in terms of KL divergence

OT = finding the shortest geodesic from \(\operatorname{id}\) to constraint set \(\mathcal C(\mu_0,\mu_1)\)


We recover:

  • polar factorization
  • Hodge decomposition
\mathcal C(\mu_0,\mu_1)
\nabla p
\dot\mu=-\operatorname{div}(\mu\nabla p)
\operatorname{Diff}(\mathbb R^n)
\operatorname{Dens}(\mathbb R^n)


cf. Benamou-Brenier formulation

Riemannian submersion \((\operatorname{Diff}(\mathbb R^n),L^2(\mu_0))\overset{\pi}{\longrightarrow}(\operatorname{Dens}(\mathbb R^n), \text{OT})\)

Geodesic equation on... is...
inviscid Burgers
incompressible Euler
Hamilton-Jacobi + contin. eqn.
\operatorname{Diff}(\mathbb R^n)
\operatorname{Diff}_{\mu_0}(\mathbb R^n)
\operatorname{Dens}(\mathbb R^n)
Gradient flow of... is...
entropy heat flow
entropy + potential Fokker-Planck
loss functional L training inf. wide NN

[Chizat & Bach, 2018]


OT Inform. theory Unbalanced OT LDDMM Metamorphoses
top space
bottom space anything with an action
anything with an action
action ... ...
metric Wasserstein Fisher-Rao Wasserstein-Fisher-Rao ​​​​​​​induced metric induced metric
\operatorname{Diff}\ltimes C^{\infty}
\operatorname{Diff}\ltimes C^{\infty}
L^2(\mathrm dx,K)
L^2(\mathrm dx,K)

Some other submersions

[Bauer, Bruveris & Michor, 2016], [Gallouët & Vialard, 2018], [Younes, 2010], [Trouvé & Younes, 2005], [Modin, 2015]

finite-dimensional equivalents when restricting to Gaussian measures:
submersion \(\operatorname{GL}(n)\to \operatorname{PSD}(n)\),
induces Bures-Wasserstein and Fisher-Rao

\varphi\mapsto \varphi_*\mu
\varphi\mapsto \varphi^*\mu
(\varphi,\lambda)\mapsto \varphi^*(\lambda^2\mu)

Talk about the infinite-dimensional Riemannian geometry of Optimal Transport for the shape analysis seminar at the MAP5 lab.

