Optimal transport

transport maps and constrained gradient flows

Théo Dumont

D., Lacombe, Vialard. Learning Monge maps with constrained drifting models, preprint, 2026.

\mu_0
\gamma
\pi

1.    Introduction to optimal transport: transport maps and transport plans

Gaspard Monge

Leonid Kantorovitch

[Monge, 1781], [Kantorovitch, 1942]

  • a measure over a set \(\mathcal X\): a function \(\mu:\Sigma_{\mathcal X}\to\mathbb R\) that satisfies
    1. \(\mu(B)\geq0\) for all \(B\in\Sigma_{\mathcal X}\)
    2. \(\mu(\varnothing)=0\)
    3. countable additivity
       
  • a probability measure: \(\mu(\mathcal X)=1\)
\mu

A "continuous" measure \(\mathrm d\mu(x)=f(x)\mathrm dx\).
(has a density w.r.t. the Lebesgue measure \(\mathrm dx\)).

A discrete measure \(\mu=\sum_{i=1}^n a_i\delta_{x_i}\).

\mu

Introduction

  • measures can represent anything

Introduction

  • measures can represent anything:
    point clouds, histograms, 2D images, 3D images, densities of a fluid... 
\mu

A "continuous" measure \(\mathrm d\mu(x)=f(x)\mathrm dx\).
(has a density w.r.t. the Lebesgue measure \(\mathrm dx\))

A discrete measure \(\mu=\sum_{i=1}^n a_i\delta_{x_i}\).

\mu

3D point cloud
[Hui, Liu, Zeng, Fu, Vahdat, 2025]

2D image
[Ibáñez, Darras, 1964]

3D image
[Kilian, Mitra, Pottmann, 2007]

histogram
[Dumont, 2026]

density of a 2D fluid
[Yanovsky]

[Monge, 1781], [Kantorovitch, 1942]

  • transforming measures: for \(T:\mathcal X\to\mathcal Y\), pushforward measure \(T_*\mu\), defined as \(T_*\mu(B)\coloneqq \mu(T^{-1}(B))\)
\mu
T_*\mu
T
x
T(x)

for a continuous measure:

\mu
\delta_x
\delta_{T(x)}
T_*\mu

for a discrete measure:

Introduction

\mathrm d(T_*\mu)(x)=|\!\det \mathrm d(T^{-1})(x)|\mu(T^{-1}(x))\mathrm dx
\mu
T_*\mu
\mu

A "continuous" measure \(\mathrm d\mu(x)=f(x)\mathrm dx\).
(has a density w.r.t. the Lebesgue measure \(\mathrm dx\))

A discrete measure \(\mu=\sum_{i=1}^n a_i\delta_{x_i}\).

\mu

[Monge, 1781], [Kantorovitch, 1942]

\mu_0
\gamma
T
x
T(x)
  • probability measures \(\mu_0,\gamma\in \mathcal P(\mathbb R^d)\)
  • piles of sand: find strategy \(T\)
\displaystyle \text{OT}(\mu_0,\gamma)= \inf_{T} \int_{\mathbb R^d} c\big(x,T(x)\big)\,\mathrm d\mu_0(x)\qquad\text{over }T:\mathbb R^d\to\mathbb R^d\text{ such that } T_*\mu_0=\gamma.

OT problem (Monge)

  • find the best strategy: cost function \(c:\mathbb R^d\times\mathbb R^d\to\mathbb R\) (e.g. \(\|x-y\|^2\))

Optimal transport

\displaystyle \text{OT}(\mu_0,\gamma)= \inf_{T} \int_{\mathcal X} c\big(x,T(x)\big)\,\mathrm d\mu_0(x)\qquad\text{over }T:\mathcal X\to\mathcal Y\text{ such that } T_*\mu_0=\gamma.

OT problem (Monge)

[Monge, 1781], [Kantorovitch, 1942]

\mu_0
\gamma
T
  • \(\mathcal X\) and \(\mathcal Y\) Polish spaces
  • \(\mu_0\in\mathcal P(\mathcal X),\, \gamma\in\mathcal P(\mathcal Y)\)
  • cost function \(c:\mathcal X\times\mathcal Y\to\mathbb R\)

Optimal transport

\mu_0
\gamma
T
x

graph of \(T\): \[\big\{(x,T(x))\mid x\in\mathcal X\big\}\subset \mathcal X\times\mathcal Y\]

T(x)
\displaystyle \text{OT}(\mu_0,\gamma)= \inf_{T} \int_{\mathcal X} c\big(x,T(x)\big)\,\mathrm d\mu_0(x)\qquad\text{over }T:\mathcal X\to\mathcal Y\text{ such that } T_*\mu_0=\gamma.

OT problem (Monge)

\displaystyle \text{OT}_K(\mu_0,\gamma)= \inf_{\pi} \int_{\mathcal X\times\mathcal Y} c(x,y)\,\mathrm d\pi(x,y)\quad\text{over }\pi\in\mathcal P(\mathcal X\times\mathcal Y)\text{ of marginals }\mu_0 \text{ and } \gamma.

OT problem (Kantorovitch)

\delta_x
\frac12\delta_{y_1}
\frac12\delta_{y_2}

not feasible by a map!

\(\pi\) is induced by a transport map \(T\)

\(\pi\) is a transport plan

relaxation

\mu_0
\gamma
\pi
\mu_0
\gamma
\pi

[Monge, 1781], [Kantorovitch, 1942]

Optimal transport

x
T(x)
x
\mu_0
\gamma
T
  • \(\mathcal X\) and \(\mathcal Y\) Polish spaces
  • \(\mu_0\in\mathcal P(\mathcal X),\, \gamma\in\mathcal P(\mathcal Y)\)
  • cost function \(c:\mathcal X\times\mathcal Y\to\mathbb R\)
\displaystyle \text{OT}_K(\mu_0,\gamma)= \inf_{\pi} \int_{\mathcal X\times\mathcal Y} c(x,y)\,\mathrm d\pi(x,y)\quad\text{over }\pi\in\mathcal P(\mathcal X\times\mathcal Y)\text{ of marginals }\mu_0 \text{ and } \gamma.

OT problem (Kantorovitch)

[Monge, 1781], [Kantorovitch, 1942]

  • \(\mathcal X\) and \(\mathcal Y\) Polish spaces
  • \(\mu_0\in\mathcal P(\mathcal X),\, \gamma\in\mathcal P(\mathcal Y)\)
  • cost function \(c:\mathcal X\times\mathcal Y\to\mathbb R\)
  • the set of transport plans is non-empty (always \(\mu_0\otimes\gamma\)), so existence of minimizers!
     
  • linear program in \(\pi\): easy to solve!
     
  • if \(c(x,y)=\|x-y\|^p_2\) in \(\mathbb R^d\):    \(p\)-Wasserstein distance (sometimes Earth Mover Distance)

Optimal transport

\mu_0
\gamma
T

relaxation

\(\pi\) is induced by a transport map \(T\)

\(\pi\) is a transport plan

Monge (maps)

Kantorovitch (plans)

\mu_0
\gamma
\pi
\mu_0
\gamma
\pi

[Brenier, 1987]

Can we say that the solution of (KP) is a map?

?

Optimal transport

Brenier's theorem

When \(\mathcal X=\mathcal Y=\mathbb R^d\) and \(c(x,y)=\|x-y\|^2\), if \(\mu_0\ll\mathrm dx\), then there is a unique solution to (KP), and it is the unique map pushing \(\mu_0\) onto \(\gamma\) that writes \(T^\star=\nabla \phi\) with \(\phi:\mathbb R^d\to\mathbb R\) convex.

[Brenier, 1987]

Examples

In the rest of the talk, \(\mathcal X=\mathcal Y=\mathbb R^d\), \(c(x,y)=\|x-y\|^2\), and \(\mu_0\ll\mathrm dx\).

Brenier's theorem

When \(\mathcal X=\mathcal Y=\mathbb R^d\) and \(c(x,y)=\|x-y\|^2\), if \(\mu_0\ll\mathrm dx\), then there is a unique solution to (KP), and it is the unique map pushing \(\mu_0\) onto \(\gamma\) that writes \(T^\star=\nabla \phi\) with \(\phi:\mathbb R^d\to\mathbb R\) convex.

\mu_0
\gamma
  • \(\ \)

\(0\)

\(-x\)

\(x\)

\(T(x)=-x\)?
\(T(x)=\nabla\phi(x)\),
with \(\phi(x)=-\frac12\|x\|^2\), not convex

\mu_0
\gamma
T^\star

\(T^\star(x)=x+L\)
\(T^\star(x)=\nabla\phi(x)\),
with \(\phi(x)=\frac12\|x\|^2+Lx\) convex

  • \(\ \)

\(L\)

How to find this OT map?

?

2.    Finding the OT map with constrained gradient flows

[D, Lacombe, Vialard, 2026]

[Kilian, Mitra, Pottmann, 2007]

??

OT problem (Monge)

\displaystyle T^\star\in \argmin_{T} \int_{\mathbb R^d} \|x-T(x)\|^2\,\mathrm d\mu_0(x)\qquad\text{over }T:\mathbb R^d\to \mathbb R^d\ \ \ \ \ \ \text{ such that } T_*\mu_0=\gamma.
\operatorname{id}
L^2_{\mu_{{}_0}}\!(\mathbb R^d,\mathbb R^d)
T^\star
\displaystyle T^\star\in \argmin_{T} \int_{\mathbb R^d} \|x-T(x)\|^2\,\mathrm d\mu_0(x)\qquad\text{over }T\in L^2_{\mu_0}(\mathbb R^d,\mathbb R^d)\text{ such that } T_*\mu_0=\gamma.

OT problem (Monge)

Let \(\mu_0,\gamma\in\mathcal P(\mathbb R^d)\) with finite 2nd-order moment.

Brenier's theorem

\(T^\star\) is the gradient of a convex function.

\{T\mid T_*\mu_0=\gamma\}
K

Then any map \(T\) such that \(T_*\mu_0=\gamma\) belongs to \(L^2_{\mu_0}(\mathbb R^d,\mathbb R^d)\).

\int \|T(x)\|^2\mu_0(x)=\int \|x\|^2(T_*\mu_0)(x)=\int \|x\|^2\gamma(x)<\infty

Finding the OT map

[D, Lacombe, Vialard, 2026]

\displaystyle T^\star\in \argmin_{T} \int_{\mathbb R^d} \|x-T(x)\|^2\,\mathrm d\mu_0(x)\qquad\text{over }T:\mathbb R^d\to \mathbb R^d\ \ \ \ \ \ \text{ such that } T_*\mu_0=\gamma.
\{\text{transport maps}\}
K\coloneqq \{\nabla \phi\mid\phi\in \dot H^1_{\mu_0}(\mathbb R^d,\mathbb R)\text{ is convex}\}

\(T^\star\in\)

Proof.

(it is a convex cone)

\(T^\star_*\mu_0=\gamma\)

Let \(D:\mathcal P(\mathbb R^d)\times \mathcal P(\mathbb R^d)\to\mathbb R_+\) such that
\(D(\mu,\nu)=0\iff\mu=\nu\).

Finding the OT map

[D, Lacombe, Vialard, 2026]

\operatorname{id}
L^2_{\mu_{{}_0}}\!(\mathbb R^d,\mathbb R^d)
T^\star
\{T\mid T_*\mu_0=\gamma\}
K

not very practical: can we see it differently?

\(D(T^\star_*\mu_0,\gamma)=0\)

\iff
K

and

\(T^\star\in\)

\displaystyle T^\star\in \argmin_{T\in K} D(T_*\mu_0,\gamma).

Our new problem

How to solve this?

?

Let \(H\) be some Hilbert space and let \(F:H\to \mathbb R\).
Say I want to find

Gradient flows in Hilbert spaces

\displaystyle x^\star\in \argmin_{x\in H} F(x).
x_0
\partial_t x_t=-\nabla F(x_t)

Let \(H\) be some Hilbert space and let \(F:H\to \mathbb R\).
Say I want to find

Gradient flow

The gradient flow of some function \(F:H\to\mathbb R\) is a solution \(x_t\) of

starting at some \(x_0\in H\), for all \(t\geq0\).

Does it converge to \(x^\star\)?

?

x_0
-\nabla F(x_0)

Gradient flows in Hilbert spaces

\displaystyle x^\star\in \argmin_{x\in H} F(x).
x^\star
x_0
-\nabla F(x_0)

Theorem

Assume that \(F\) is \(\lambda\)-convex around its unique minimizer \(x^\star\), with \(\lambda>0\). Then \(x_t\) converges to \(x^\star\) at an exponential rate: \[\|x_t-x^\star\|^2\leq e^{-\lambda t}\|x_0-x^\star\|^2.\]

x^\star
\text{for all }x,\ F((1-t)x+tx^\star)\leq (1-t)F(x)+tF(x^\star)-\frac\lambda2t(1-t)\|x-x^\star\|^2.

Let \(H\) be some Hilbert space and let \(F:H\to \mathbb R\).
Say I want to find

\displaystyle x^\star\in \argmin_{x\in H} F(x).

Gradient flows in Hilbert spaces

\frac{\mathrm d }{\mathrm d t} \|x_t-x^\star\|^{2}=2\langle x_t-x^\star,\dot{x_t}\rangle =-2\langle x_t-y_t,\nabla F(x_t)\rangle \leq-\lambda\|x_t-x^\star\|^{2}.

Proof.

x_0
-\nabla F(x_0)
x^\star
\displaystyle x^\star\in \argmin_{x\in H} F(x).

If \(F\) is \(\lambda\)-convex around \(x^\star\) with \(\lambda>0\), then the gradient flow of \(F\) converges to 

Let \(H\) be some Hilbert space and let \(F:H\to \mathbb R\).

Gradient flows in Hilbert spaces

Takeaway

\operatorname{id}
L^2_{\mu_{{}_0}}\!(\mathbb R^d,\mathbb R^d)
T^\star
\{T\mid T_*\mu_0=\gamma\}
K
T_t
\partial_t T_t=\operatorname{proj}_{\operatorname{Tan}K}\big[-\nabla F_\gamma(T_t)\big]

Constrained gradient flows in the set of transport maps

[D, Lacombe, Vialard, 2026]

K\coloneqq \{\nabla \phi\mid\phi\in \dot H^1_{\mu_0}(\mathbb R^d,\mathbb R)\text{ is convex}\}

\(T^\star\in\)

We need to stay in \(K\)!

!

\displaystyle T^\star\in \argmin_{T\in K} D(T_*\mu_0,\gamma).
\partial_t T_t=-\nabla F_\gamma(T_t)

starting at \(T_0=\operatorname{id}\), for all \(t\geq0\).

If \(F_\gamma:T\mapsto D(T_*\mu_0,\gamma)\) is \(\lambda\)-convex around \(T^\star\) with \(\lambda>0\), then the gradient flow of \(F_\gamma\)
converges to

Takeaway

\operatorname{proj}\big[-\nabla F_\gamma(T_t)\big]
T_t
K
-\nabla F_\gamma(T_t)

constrained

\operatorname{id}
L^2_{\mu_{{}_0}}\!(\mathbb R^d,\mathbb R^d)
T^\star
\{T\mid T_*\mu_0=\gamma\}
K
T_t
  1. Find a functional \(F_\gamma:T\mapsto D(T_*\mu_0,\gamma)\) that is \(\lambda\)-convex around \(T^\star\).
     
  2. Show (rigorously this time) that the constrained gradient flow of \(F_\gamma\) converges to \(T^\star\).

constrained gradient flow

[D, Lacombe, Vialard, 2026]

Constrained gradient flows in the set of transport maps

\displaystyle T^\star\in \argmin_{T\in K} D(T_*\mu_0,\gamma).

starting at \(T_0=\operatorname{id}\), for all \(t\geq0\).

If \(F_\gamma:T\mapsto D(T_*\mu_0,\gamma)\) is \(\lambda\)-convex around \(T^\star\) with \(\lambda>0\), then the gradient flow of \(F_\gamma\)
converges to

Takeaway

\partial_t T_t=\operatorname{proj}_{\operatorname{Tan}K}\big[-\nabla F_\gamma(T_t)\big]
  1. Find a functional \(F_\gamma:T\mapsto D(T_*\mu_0,\gamma)\) that is \(\lambda\)-convex around \(T^\star\).

\(\displaystyle D(\mu,\gamma)\coloneqq\int_{\mathbb R^d}\mu\log\frac\mu\gamma=\int_{\mathbb R^d} V\mu+\int_{\mathbb R^d}\mu\log\mu \)

entropy

potential

Write \(\gamma=e^{-V}\,\mathrm dx\) and let \(D\) be the relative entropy. If \(V\) is \(\lambda\)-convex on \(\mathbb R^d\), then \(F_\gamma\) is \(\lambda\)-convex around \(T^\star\) on \(L^2_{\mu_0}(\mathbb R^d,\mathbb R^d)\).

Proposition. [D, Lacombe, Vialard, 2026]

Write \(\gamma=e^{-V}\,\mathrm dx\). Define the relative entropy (or KL divergence):

(this comes quite easily from the very nice convexity properties of the relative entropy on \(\mathcal P(\mathbb R^d)\))

We found our functional \(F_\gamma\)!

:)

Constrained gradient flows in the set of transport maps

[D, Lacombe, Vialard, 2026]

Constrained gradient flows in the set of transport maps

2. Show (rigorously this time) that the constrained gradient flow of \(F_\gamma\) converges to \(T^\star\).

Theorem. [D, Lacombe, Vialard, 2026]

Let \(\mu_0\in\mathcal P(\mathbb R^d)\) be some absolutely continuous probability measure, let \(\gamma=e^{-V}\in\mathcal P(\mathbb R^d)\) be some probability measure with \(V\) \(\lambda\)-convex, \(\lambda>0\).

Then the constrained gradient flow:
   \(\circ\) admits a solution of time-regularity \(H^1\)
   \(\circ\) and converges exponentially fast to the optimal
    transport map
 between \(\mu_0\) and \(\gamma\): \[\|T_t-T^\star\|^2_{\mu_0}\leq Ce^{-2\lambda t}\|\operatorname{id}-T^\star\|^2_{\mu_0}.\]

[D, Lacombe, Vialard, 2026], [Ambrosio, Gigli, Savaré, 2005]

approximate the time-continuous flow by a discrete implicit scheme [AGS, 2005] 

proceed similarly as in the non-constrained case

Recap

[D, Lacombe, Vialard, 2026]

OT problem (Monge)

\displaystyle T^\star\in \argmin_{T} \int_{\mathbb R^d} \|x-T(x)\|^2\,\mathrm d\mu_0(x)\qquad\text{over }T:\mathbb R^d\to \mathbb R^d\ \ \ \ \ \ \text{ such that } T_*\mu_0=\gamma.
\displaystyle T^\star\in \argmin_{T\in K} D(T_*\mu_0,\gamma).

Our new problem

constrained gradient flow

starting at \(T_0=\operatorname{id}\), for all \(t\geq0\).

\partial_t T_t=\operatorname{proj}_{\operatorname{Tan}K}\big[-\nabla F_\gamma(T_t)\big]

What's more?

  • computationally:
    • how to compute this constrained gradient flow that converges to \(T^\star\)?
    • parameterize the set \(K\) as \(\theta\mapsto \nabla\phi_\theta\), where \(\phi_\theta\) is an ICNN (input convex neural network)
    • compute this:


      for that, we need to
      1. know/compute \(\nabla F_\gamma\): \(\nabla F_\gamma(T)=(\nabla\log(T_*\mu_0)+\nabla V)\circ T\)
      2. compute the projection \(\operatorname{proj}_C[v]=\argmin_{w\in C}\|w-v\|\) (we solve this using gradient descent)
         
  • connection with natural gradient flow/descent
    • the induced evolution in parameter space is a natural gradient flow
\partial_t T_t=\operatorname{proj}_{\operatorname{Tan}K}\big[-\nabla F_\gamma(T_t)\big]

[D, Lacombe, Vialard, 2026]

Ambrosio, Gigli, Savaré (2005). Gradient flows: in metric spaces and in the space of probability measures

Brenier, Y. (1987). Décomposition polaire et réarrangement monotone des champs de vecteurs

Dumont, T., Lacombe, T., and Vialard, F.-X. (2026). Learning Monge maps with constrained drifting models.

Kantorovich, L. (1942). On the translocation of masses.

Monge, G. (1781). Mémoire sur la théorie des déblais et des remblais

Villani, C. (2008). Optimal transport: old and new.

References

Thank you!