A geometric view of optimal transport

\mu_0

\mu_1

Théo Dumont

\operatorname{Dens}(\mathbb R^n)

slides available at https://slides.com/theodumont/geometric-ot

Optimal Transport

Gaspard Monge
1746-1818

Leonid Kantorovitch
1912-1986

Yann Brenier
1957-

\displaystyle \text{OT}(\mu_0,\mu_1)= \inf_{\varphi} \int_{\mathbb R^n} \|x-\varphi(x)\|^2\,\mathrm d\mu_0\qquad\text{over }\varphi:\mathbb R^n\to\mathbb R^n\text{ such that } \varphi_*\mu_0=\mu_1.

\mu_0,\mu_1\in\mathcal P(\mathbb R^n)

\mu_0

\mu_1

\displaystyle \text{OT}_K(\mu_0,\mu_1)= \inf_{\pi} \int_{\mathbb R^n\times\mathbb R^n} \|x-y\|^2\,\mathrm d\pi\quad\text{over }\pi\in\mathcal P(\mathbb R^n\times \mathbb R^n)\text{ such that } P^i_*\pi=\mu_i.

Definition. (OT Kantorovitch problem)

\delta_x

\delta_{y_1}

\delta_{y_2}

not feasible by a map!

\(\pi\) is induced by a transport map \(\varphi\)

\(\pi\) is a transport plan

relaxation

\mu_0

\mu_1

\pi

\mu_0

\mu_1

\pi

\(|\!\det d\varphi^{-1}|\mu_0 \circ \varphi^{-1}\)

[Monge, 1781], [Kantorovitch, 1942]

Optimal Transport

Definition. (OT Monge problem)

\varphi

Can we say that the solution of (K) is a map?

If \(\mu_0\) has a density, then there is a unique solution to (K), and it is of the form \(\varphi=\nabla f\) with \(f:\mathbb R^n\to\mathbb R\) convex.

Theorem. (Brenier)

relaxation

\(\pi\) is induced by a transport map \(\varphi\)

\(\pi\) is a transport plan

Monge (maps)

Kantorovitch (plans)

\mu_0

\mu_1

\pi

\mu_0

\mu_1

\pi

[Brenier, 1987]

Optimal Transport

\operatorname{Dens}(\mathbb R^n)\coloneqq\{\mu(x)\mathrm dx\mid\mu\in C^{\infty}(\mathbb R^n),\int\mu=1\}

\mu_0,\mu_1\in\operatorname{Dens}(\mathbb R^n)

\mu_0

\mu_1

Smooth densities:

can we recover classical results of OT theory with a geometric picture?

\coloneqq\mathcal C(\mu_0,\mu_1)

\displaystyle \text{OT}(\mu_0,\mu_1)= \min_{\varphi} \int_{\mathbb R^n} \|x-\varphi(x)\|^2\,\mathrm d\mu_0\qquad\text{over }\varphi\in\operatorname{Diff}(\mathbb R^n)\text{ such that } \varphi_*\mu_0=\mu_1

Definition. (Smooth OT problem)

Smooth OT

\operatorname{Diff}(\mathbb R^n)\coloneqq\{\varphi:\mathbb R^n\to\mathbb R^n\mid \varphi,\varphi^{-1}\in C^\infty(\mathbb R^n)\}

Smooth maps:

\varphi

\mathcal C(\mu_0,\mu_1)

not right-invariant! only by action of \(\operatorname{Diff}_{\mu_0}(\mathbb R^n)\)

\operatorname{Diff}(\mathbb R^n)

\varphi

\operatorname{id}

Theorem. (Geodesic equation on \(\operatorname{Diff}(\mathbb R^n)\))

\begin{cases} \ddot\varphi_t=0\\ \varphi_t=(1-t)\varphi_0+t\varphi_1 \end{cases}

or if we write \(\dot\varphi=v\circ\varphi\),
inviscid Burgers \(\dot v+\nabla_v v=0\)

\displaystyle d^2(\varphi_0,\varphi_1)=\int_0^1g_{\varphi_t}(\dot\varphi_t,\dot\varphi_t)\,\mathrm dt

\displaystyle =\int_0^1\int_{\mathbb R^d}\|\varphi_1-\varphi_0\|^2\,\mathrm d\mu_0\,\mathrm dt

\displaystyle =\int_{\mathbb R^d}\|\varphi_1-\varphi_0\|^2\,\mathrm d\mu_0

horizontal?

finding the shortest geodesic from \(\operatorname{id}\) to \(\mathcal C(\mu_0,\mu_1)\)

\(\displaystyle \text{OT}(\mu_0,\mu_1)=\min_{\varphi\,\in \,\mathcal C(\mu_0,\mu_1)} d^2(\operatorname{id},\varphi)\)

A first link with OT

T_\varphi\operatorname{Diff}(\mathbb R^n)=\{v\circ\varphi\mid v\in \mathfrak X(\mathbb R^n)\}

\dot\varphi

Tangent space at \(\varphi\):

\displaystyle G_\varphi(\dot\varphi,\dot\varphi)\coloneqq\int_{\mathbb R^d}\|\dot\varphi\|^2\,\mathrm d\mu_0 =\int_{\mathbb R^d}\|v\|^2\,\mathrm d\mu

Metric at \(\varphi\):

\varphi_*\mu_0

[Otto, 2001], [Kriegl & Michor, 1997], [Modin, 2015]

Diffeomorphism group

\operatorname{id}

\varphi

\operatorname{Diff}_{\mu_0}(\mathbb R^n)

\mathcal C(\mu_0,\mu_1)

\operatorname{Diff}(\mathbb R^n)

\(\pi: \operatorname{Diff}(\mathbb R^n)\longrightarrow\operatorname{Dens}(\mathbb R^n)\)
\(\varphi\longmapsto\varphi_*\mu_0\)

\mu_0

\mu_1

\pi:\varphi\mapsto\varphi_*\mu_0

\operatorname{Dens}(\mathbb R^n)

Fiber over \(\mu_1\):

\(\{\varphi\mid \varphi_*\mu_0=\mu_1\}=\mathcal C(\mu_0,\mu_1)\)

Fiber over \(\mu_0\):

\(\{\varphi\mid \varphi_*\mu_0=\mu_0\}=\operatorname{Diff}_{\mu_0}(\mathbb R^n)\)

\mu_0\in\operatorname{Dens}(\mathbb R^n)

\(d\pi(\varphi): T_\varphi\operatorname{Diff}(\mathbb R^n)\longrightarrow T_{\mu}\operatorname{Dens}(\mathbb R^n)\)
\(v\circ\varphi\longmapsto -\operatorname{div}(\mu v)\)

\varphi_*\mu_0

The submersion

\operatorname{id}

\varphi

\mu_0

\mu_1

\operatorname{Diff}_{\mu_0}(\mathbb R^n)

\pi:\varphi\mapsto\varphi_*\mu_0

\(\pi:\varphi\mapsto\varphi_*\mu_0\) is a smooth submersion.

\mathcal C(\mu_0,\mu_1)

\nabla p

\operatorname{Diff}(\mathbb R^n)

\operatorname{Dens}(\mathbb R^n)

vertical distribution:

\operatorname{Ver}_\varphi\coloneqq\ker d\pi(\varphi)\\\hspace{2.8cm}=\{v\circ\varphi\mid \operatorname{div}(\mu v)=0\}

\varphi_*\mu_0

horizontal distribution:

\operatorname{Hor}_\varphi\coloneqq (\operatorname{Ver}_\varphi)^{\perp_G}\\\hspace{3.2cm}=\{\nabla p\circ\varphi\mid p\in C^\infty(\mathbb R^n)\}

Any \(u\in\mathfrak X(\mathbb R^n)\) can be written as
\(u=v+\nabla p\)
with \(\operatorname{div}(\mu_0 v)=0\) and \(p\in C^{\infty}(\mathbb R^n)\).

T_ {\varphi}\operatorname{Diff}(\mathbb R^n)=\operatorname{Ver}_{\varphi}\overset{\perp}\oplus\operatorname{Hor}_{\varphi}

Theorem. (Helmholtz/hodge decomposition)

right-invariance under action of fiber \(\operatorname{Diff}_{\mu_0}(\mathbb R^n)\)

\implies

\(\pi\) induces a metric on \(\operatorname{Diff}(\mathbb R^n)/\operatorname{Diff}_{\mu_0}(\mathbb R^n)\simeq\operatorname{Dens}(\mathbb R^n)\)

independent
of \(G\)

depends
on \(G\)!

The submersion

(smooth submersion)

\operatorname{id}

\varphi

\mu_0

\mu_1

\pi:\varphi\mapsto\varphi_*\mu_0

\mathcal C(\mu_0,\mu_1)

\dot\mu=-\operatorname{div}(\mu\nabla p)

\operatorname{Diff}(\mathbb R^n)

\operatorname{Dens}(\mathbb R^n)

\nabla p

\dot\mu

\(\pi\) is a Riemannian submersion

\displaystyle g_\mu(\dot\mu,\dot\mu)\coloneqq \inf_{d\pi(\varphi).\dot\varphi=\dot\mu} G_\varphi(\dot\varphi,\dot\varphi)

Pythagoras \(\implies\dot\varphi\) has to be horizontal!

\displaystyle g_\mu(\dot\mu,\dot\mu)=\int_{\mathbb R^n}\|\nabla p\|^2\,\mathrm d\mu

where \(\dot\mu\) and \(\nabla p\) are linked by \(\dot\mu=-\operatorname{div}(\mu\nabla p)\)

Theorem. (Induced metric on \(\operatorname{Dens}(\mathbb R^n)\))

\displaystyle=\int_{\mathbb R^n}\|\nabla \Delta_{\mu}^{-1}\dot\mu\|^2\,\mathrm d\mu

where \(\Delta_\mu p=\operatorname{div}(\mu\nabla p)\).

The submersion

(Riemannian submersion)

\operatorname{id}

\varphi_1

\mu_0

\mu_1

\pi

\mathcal C(\mu_0,\mu_1)

\nabla p

\dot\mu=-\operatorname{div}(\mu\nabla p)

\operatorname{Diff}(\mathbb R^n)

\operatorname{Dens}(\mathbb R^n)

what do the geodesics look like in \(\operatorname{Dens}(\mathbb R^n)\)?

\dot\mu

geodesics in \(\operatorname{Dens}(\mathbb R^n)\)

horizontal geodesics in \(\operatorname{Diff}(\mathbb R^n)\)

\(\iff\)

\varphi_t=\operatorname{id}+t\nabla p

\begin{cases} \dot\mu+\operatorname{div}(\mu\nabla p)=0\\ \dot p+\frac12 \|\nabla p\|^2=0 \end{cases}

(Hamilton-Jacobi)

(continuity equation)

Theorem. (Geodesic equation on \(\operatorname{Dens}(\mathbb R^n)\))

The induced geodesic distance on \(\operatorname{Dens}(\mathbb R^n)\) is the OT distance!

(see previous computation \(d^2(\operatorname{id},\varphi)=\text{OT}(\mu_0,\mu_1)\))

at time \(t=1\): \(\varphi_1=\operatorname{id}+\nabla p=\nabla f\)

what's the final map?

Brenier's theorem

The submersion

(Riemannian submersion)

\operatorname{id}

\varphi

\mu_0

\mu_1

\pi

\mathcal C(\mu_0,\mu_1)

v_t

\operatorname{Diff}(\mathbb R^n)

\operatorname{Dens}(\mathbb R^n)

\dot\mu_t

\text{OT}(\mu_0,\mu_1)=\inf_{v_t,\,\mu_t}\int_0^1 \|v_t\|^2_{L^2(\mu_t)}\,\mathrm dt

\(\implies\) this is just finding the curve of minimal energy between \(\mu_0\) and \(\mu_1\) in \(\operatorname{Dens}(\mathbb R^n)\),

i.e. finding a (horizontal) geodesic!

where \(\dot\mu_t+\operatorname{div}(\mu_t v_t)=0\), and where \(\mu_t\) has the right endpoints.

[Benamou & Brenier, 2000]

Usefulness:

computational: scale to bigger number of points
theoretical: propose extensions of the OT framework (unbalanced OT, ...)

Applications

Benamou-Brenier

Theorem. (Benamou-Brenier, dynamic formulation of OT)

\operatorname{id}

\nabla f

\operatorname{Diff}(\mathbb R^n)

Any \(\varphi\in\operatorname{Diff}(\mathbb R^n)\) can be written as
\(\varphi=\nabla f\circ\phi\)
with \(f\in C^{\infty}(\mathbb R^n)\) and \(\phi\in\operatorname{Diff}_{\mu_0}(\mathbb R^n)\).

\phi

\varphi

\operatorname{Diff}_{\mu_0}(\mathbb R^n)

[Brenier, 1987]

Applications

Polar factorization

Theorem. (Polar factorization)

\mu

\pi

-\nabla \frac{\delta F}{\delta\mu}

\operatorname{Diff}(\mathbb R^n)

\operatorname{Dens}(\mathbb R^n)

\dot\mu=\operatorname{div}\Big(\mu\nabla\frac{\delta F}{\delta\mu}(\mu)\Big).

The gradient flow of \(\mu\) w.r.t. a functional \(F\) is

``\dot x=-\operatorname{grad}^g F(x)"

Fréchet derivative

F(\mu)=\int_{\mathbb R^n}\mu(\log\mu-1)+\int_{\mathbb R^n}V\mu

Example:

entropy

potential

\frac{\delta F}{\delta \mu}(\mu)=\log\mu+V

\dot\mu=\Delta\mu+\operatorname{div}(\mu\nabla V)

+ heat flow

Fokker-Planck equation.

\dot\mu=-\operatorname{div}(\mu\nabla p)

converges to stationary distribution \(\mu_\infty=\frac1{\int e^{-V}}e^{-V}\)
+ \(\lambda\)-convexity of \(F\) along geodesics implies exponential convergence in terms of KL divergence

\dot\mu

Applications

Gradient flows

Definition. (Gradient flow)

OT = finding the shortest geodesic from \(\operatorname{id}\) to constraint set \(\mathcal C(\mu_0,\mu_1)\)

horizontal

We recover:

polar factorization
Hodge decomposition

\operatorname{id}

\varphi_1

\mu_0

\mu_1

\pi:\varphi\mapsto\varphi_*\mu_0

\mathcal C(\mu_0,\mu_1)

\nabla p

\dot\mu=-\operatorname{div}(\mu\nabla p)

\operatorname{Diff}(\mathbb R^n)

\operatorname{Dens}(\mathbb R^n)

\dot\mu

cf. Benamou-Brenier formulation

Riemannian submersion \((\operatorname{Diff}(\mathbb R^n),L^2(\mu_0))\overset{\pi}{\longrightarrow}(\operatorname{Dens}(\mathbb R^n), \text{OT})\)

Geodesic equation on...	is...
	inviscid Burgers
	incompressible Euler
	Hamilton-Jacobi + contin. eqn.

\operatorname{Diff}(\mathbb R^n)

\operatorname{Diff}_{\mu_0}(\mathbb R^n)

\operatorname{Dens}(\mathbb R^n)

Gradient flow of...	is...
entropy	heat flow
entropy + potential	Fokker-Planck
loss functional L	training inf. wide NN

[Chizat & Bach, 2018]

(local)

Recap

	OT	Inform. theory	Unbalanced OT	LDDMM	Metamorphoses
top space
metric
right-invariant?
bottom space				anything with an action	anything with an action
action				...	...
metric	Wasserstein	Fisher-Rao	Wasserstein-Fisher-Rao	induced metric	induced metric

\operatorname{Diff}

\operatorname{Diff}\ltimes C^{\infty}

\operatorname{Diff}

\operatorname{Dens}

\operatorname{Dens}^+

L^2(\mu_0)

H^1(\mu_0)

L^2(\mu_0)

\mathcal H_K(\mathrm dx)

[Bauer, Bruveris & Michor, 2016], [Gallouët & Vialard, 2018], [Younes, 2010], [Trouvé & Younes, 2005], [Modin, 2015]

finite-dimensional equivalents when restricting to Gaussian measures:
submersion \(\operatorname{GL}(n)\to \operatorname{PSD}(n)\),
induces Bures-Wasserstein and Fisher-Rao

\varphi\mapsto \varphi_*\mu

\varphi\mapsto \varphi^*\mu

(\varphi,\lambda)\mapsto \varphi^*(\lambda^2\mu)

Some other submersions

L. Ambrosio, N. Gigli, and G. Savaré. Gradient flows: in metric spaces and in the space of probability measures, 2005.

J.-D. Benamou and Y. Brenier. A computational fluid mechanics solution to the Monge-Kantorovich mass transfer problem, 2000.

M. Bauer, M. Bruveris, and P.W. Michor. Uniqueness of the Fisher–Rao metric on the space of smooth densities, 2016.

Y. Brenier. Décomposition polaire et réarrangement monotone des champs de vecteurs, 1987.

L. Chizat and F. Bach. On the global convergence of gradient descent for over-parameterized models using optimal transport, 2018.

W. Gangbo, H.K. Kim, and T. Pacini. Differential forms on Wasserstein space and infinite-dimensional Hamiltonian systems, 2010.

T. Gallouët and F.-X. Vialard. The Camassa–Holm equation as an incompressible Euler equation: A geometric point of view, 2018.

A. Kriegl and P.W. Michor. The convenient setting of global analysis, 1997.

K. Modin. Geometry of matrix decompositions seen through optimal transport and information geometry, 2016.

F. Otto. The geometry of dissipative evolution equations: the porous medium equation, 2001.

A. Trouvé and L. Younes. Metamorphoses through lie group action, 2005.

L. Younes. Shapes and diffeomorphisms, 2010.

slides available at https://slides.com/theodumont/geometric-ot

•

References

A geometric view of optimal transport

By Théo Dumont

A geometric view of optimal transport

Talk about the infinite-dimensional Riemannian geometry of Optimal Transport for the shape analysis seminar (https://shape-analysis.github.io/) at the MAP5 lab.

Théo Dumont

PhD student in optimal transport & geometry @ Université Gustave Eiffel

theodumont.github.io