Flavien Léger

NNCC spaces in optimization

joint works with Pierre-Cyril Aubin-Frankowski &

Gabriele Todeschi and François-Xavier Vialard

Overview

Minimize \[\mathcal{E}\colon X\to\mathbb{R}\cup\{+\infty\}\]

using a function \(c(x,y)\) as a “movement limiter”

⏵ \(X\) : possibly infinite-dimensional

⏵ \(\mathcal{E},c\) : possibly nonsmooth 

(Jacobs–Lee–L ‘21)

Implicit and explicit methods with

a cost \(c(x,y)\)

1.

2.

Evolution variational inequalities (EVIs)

3.

NNCC spaces

Implicit method with cost \(c(x,y)\)

x_{n+1}\in\operatorname*{argmin}_{x\in X} \frac{d^2}{2\tau}(x,x_n)+\mathcal{E}(x)

\((X,d)\) metric space

x_{n+1}\in\operatorname*{argmin}_{x\in X} c(x,x_n)+\mathcal{E}(x)

\(X\) an arbitrary set

\mathcal{E}\colon X\to\mathbb{R}\cup\{+\infty\}
\inf_{x\in X} \mathcal{E}(x)

Proximal point method

cost function \(c\colon X\times X\to\mathbb{R}\)

\(c(x,y)\geq 0\),    \(c(x,x)=0\)

(\tau>0)

Implicit method with cost \(c(x,y)\)

x_{n+1}\in\operatorname*{argmin}_{x\in X} \mathcal{E}(x)+c(x,x_n)
\mathcal{E}(x)=\inf_{y\in X} \mathcal{E}(x)+c(x,y)\quad\longrightarrow
\left\{\begin{aligned} y_{n+1} &\in \operatorname*{argmin}_{y\in X} \phi(x_n,y)\\ x_{n+1} &\in \operatorname*{argmin}_{x\in X} \phi(x,y_{n+1}) \end{aligned}\right.
\iff

Remarks/

Motivations

  • Tailored \(c(x,y)\)
  • Regularizing operator
  • Gradient flows   “\(\dot x(t)=-\nabla \mathcal{E}(x(t))\)”        (\(\tau\to0\))
x_{n+1}\in\operatorname*{argmin}_{x\in X} c(x,x_n)+\mathcal{E}(x)
x_{n+1}\in\operatorname*{argmin}_{x\in X} \frac{d^2}{2\tau}(x,x_n)+\mathcal{E}(x)

Define gradient flows in nonsmooth settings

⏵ \(c(x,y)\) as a proxy for \(d^2(x,y)\)

(Ambrosio–Gigli–Savaré ’05)

(Rankin–Wong ’24)

(AM)

\inf_{x\in X} \mathcal{E}(x)=\inf_{x\in X, \,y\in X} \underbrace{\mathcal{E}(x)+c(x,y)}_{\eqqcolon\phi(x,y)}

Explicit method with cost \(c(x,y)\)

How to do an explicit method with cost \(c(x,y)\) ?

c\colon X\times Y\to\mathbb{R}

\(Y\) another set

\(X\) an arbitrary set

\mathcal{E}\colon X\to\mathbb{R}\cup\{+\infty\}
\inf_{x\in X} \mathcal{E}(x)

c-concavity

Definition. \(\mathcal{E}\) is c-concave if there exists \(h\colon Y\to \mathbb{R}\cup\{+\infty\}\) s.t. \[\mathcal{E}(x)=\inf_{y\in Y}c(x,y)+h(y).\]

Smallest such \(h\) is the c-transform \(\mathcal{E}^c(y)=\sup_{x\in X} \mathcal{E}(x)-c(x,y).\)

\(\mathcal{E}\) is \(c\)-concave

\(\mathcal{E}\) is not \(c\)-concave

\(c(x,y)=\frac{L}{2}\lVert x-y\rVert^2\)

\(\mathcal{E}\) is \(c\)-concave \(\iff \nabla^2 \mathcal{E}\preccurlyeq L\, I_{d\times d}\)

Example. \(X=Y=\mathbb{R}^d\)

\(c(x,y)+\mathcal{E}^c(y)\)

\(\mathcal{E}\)

\(\mathcal{E}\)

Explicit algorithm

\longrightarrow\,\inf_{x\in X} \mathcal{E}(x)=\inf_{x\in X, \,y\in Y} \underbrace{c(x,y)+\mathcal{E}^c(y)}_{\eqqcolon\phi(x,y)}
\begin{aligned} y_{n+1} &\in \operatorname*{argmin}_{y\in Y} \phi(x_n,y)\\ x_{n+1} &\in \operatorname*{argmin}_{x\in X} \phi(x,y_{n+1}) \end{aligned}

Explicit algorithm

Suppose \(\mathcal{E}\) is c-concave:

Nonsmooth settings

Smooth settings

\begin{aligned} -\nabla_xc(x_n,y_{n+1})&=-\nabla \mathcal{E}(x_n)\\ \nabla_xc(x_{n+1},y_{n+1})&=0 \end{aligned}

\(X,Y\) finite-dimensional manifolds,

twisted \(c\in C^1(X\times Y)\),

\(\mathcal{E}\in C^1(X)\)

\(\mathcal{E}\)

c(x,y)=\left\{\begin{aligned} &\frac{L}{2}\lVert x-y\rVert^2 &&\longrightarrow\,\text{Gradient descent}\\ &\text{Bregman I} &&\longrightarrow\,\text{Mirror descent}\\ &\text{Bregman II} &&\longrightarrow\,\text{Natural gradient descent} \\ &\text{Riemannian} &&\longrightarrow\,\text{Riemannian gradient descent} \end{aligned} \right.

(“Gradient descent with a general cost” L–Aubin-Frankowski ‘23)

\mathcal{E}(x)=\inf_{y\in Y}c(x,y)+\mathcal{E}^c(y)

Recap

Explicit: assume \(\mathcal{E}\) is c-concave

\phi(x,y)=c(x,y)+\mathcal{E}^c(y)
\mathcal{E}(x)=\inf_{y\in Y} \phi(x,y) \quad \longrightarrow

Alternating Minimization (AM) of \(\phi\)

Implicit

\phi(x,y)=\mathcal{E}(x)+c(x,y)

Implicit+Explicit (forward–backward):   \(\mathcal{E}(x)=\mathcal{E}_1(x)+\mathcal{E}_2(x)\)

Assume \(\mathcal{E}_2\) is c-concave

\phi(x,y)=\mathcal{E}_1(x)+c(x,y)+(\mathcal{E}_2)^c(y)
\inf_{x\in X} \mathcal{E}(x)=\inf_{x\in X, \,y\in Y} \phi(x,y)

Implicit and explicit methods

with a cost \(c(x,y)\)

1.

2.

Evolution variational inequalities (EVIs)

3.

NNCC spaces

Evolution Variational Inequalities (EVIs)

Definition. Let \(\lambda\in[0,1)\). We say that \((x_n,y_n)_n\) satisfy the EVI if   \(\forall n\geq 0\),

\[\forall x\in X,y\in Y,\quad(1-\lambda)\phi(x_n,y_n)+\phi(x,y_{n+1})\leq \phi(x,y)+(1-\lambda)\phi(x,y_n).\]

x_{n} \in\displaystyle\operatorname*{argmin}_{x\in X} \phi(x,y_{n})
y_{n+1} \in \displaystyle\operatorname*{argmin}_{y\in Y} \phi(x_n,y)

\(X,Y\) two arbitrary sets,

\(\phi\colon X\times Y\to\mathbb{R}\cup\{+\infty\}\) proper

Nonsmooth, intrinsic

⏵ Condition on \(\phi\) and on the choice of iterates

T H E O R E M   (L–Aubin-Frankowski '23)

\text{EVI}(\lambda=0)\implies\phi(x_n,y_n)\leq \phi(x,y)+\frac{\phi(x,y_0)-\phi(x_0,y_0)}{n}
\text{EVI}(\lambda>0)\implies\phi(x_n,y_n)\leq \phi(x,y)+\frac{\lambda[\phi(x,y_0)-\phi(x_0,y_0)]}{\Lambda^n-1}
\Lambda\coloneqq(1-\lambda)^{-1}

Background on EVIs

x_{n} \in\displaystyle\operatorname*{argmin}_{x\in X} \mathcal{E}(x)+\frac{d^2}{2\tau}(x,x_{n-1})

EVI \((\lambda=0)\)

\forall x\in X,\quad \mathcal{E}(x_n)+\frac{1}{2\tau}d^2(x_n,x_{n-1})\leq \mathcal{E}(x)+\frac{1}{2\tau}d^2(x,x_{n-1})-\frac{1}{2\tau}d^2(x,x_n)

Euclidean

Consider implicit method

\(\phi(x,y)\): extends the five-point property of Csiszár–Tusnády ’84

\((X,d)\) non-positively curved, Mayer/Jost

Ambrosio–Gigli–Savaré

\(\mathcal{E}\) convex on geodesics

\(\mathcal{E}\) convex

\(\mathcal{E}\) convex on curves \(x(t)\) such that \(d^2(x,x_{n-1})\) is \(1\)-convex, i.e. \(t\mapsto d^2(x(t),x_{n-1})-t^2 \,d^2(x(1),x(0))\) is convex

Convergence rates from EVIs

Suppose the EVI holds:

With \(\lambda=0\) then

\[\phi(x_n,y_n)\leq \phi(x,y)+\frac{\phi(x,y_0)-\phi(x_0,y_0)}{n}\]

With \(\lambda>0\) then 

\[\phi(x_n,y_n)\leq \phi(x,y)+\frac{\lambda[\phi(x,y_0)-\phi(x_0,y_0)]}{\Lambda^n-1},\]

\(\Lambda\coloneqq(1-\lambda)^{-1}>1\).

T H E O R E M   (L–Aubin-Frankowski '23)

x_{n} \in\displaystyle\operatorname*{argmin}_{x\in X} \phi(x,y_{n})
y_{n+1} \in \displaystyle\operatorname*{argmin}_{y\in Y} \phi(x_n,y)

Implicit and explicit methods

with a cost \(c(x,y)\)

1.

2.

Evolution variational inequalities (EVIs)

3.

NNCC spaces

NNCC spaces

D E F I N I T I O N  (L–Todeschi–Vialard '24)

\((X\times Y,c)\) is an NNCC space if for each \((x_0,x_1,\bar y)\in X\times X\times Y\), there exists a path \(x(\cdot)\) from \(x_0\) to \(x_1\) such that \(\forall y\in Y\), \[c(x(t),\bar y)-c(x(t),y)\leq (1-t)[c(x_0,\bar y)-c(x_0,y)]+t[c(x_1,\bar y)-c(x_1,y)].\]

\((x(t),\bar y)\) is called a generalized c-segment.

\(X, Y\) two arbitrary sets,   \(c\colon X\times Y\to\mathbb{R}\cup\{+\infty\}\).

(Think:  \(t\mapsto c(x(t),\bar y)-c(x(t),y)\)  is convex)

NNCC spaces

History. Variant of the Ma–Trudinger–Wang (MTW) condition studied by Kim and McCann.

Original setting is smooth and finite-dimensional \(c\in C^4(X\times Y)\).

Ma, Trudinger, Wang, Loeper, Kim, McCann, Villani, Figalli, Guillen, Kitagawa, Loeper

Basic finite-dim examples:

  • \(c(x,y)=\lVert x-y\rVert^2\)
  • \(c(x,y)=\) Bregman divergence
  • Any smooth reparametrization \(c(x,y)=\lVert F(x)-G(y)\rVert^2\)...
  • Sphere

Theory. NNCC preserved by products, projections, pullbacks.

Stable under Gromov–Hausdorff.

EVIs in NNCC spaces

⏵ \((X\times X,c)\) NNCC space 

⏵ \(\mathcal{E}(\cdot)-\mu\,c(\cdot,x_n)\) convex on generalized c-segments \((x(t),x_{n-1})\)

 

 

Then EVI.

T H E O R E M   (L–Todeschi–Vialard '24)

(EVI)

\mathcal{E}(x_n)+c(x_n,x_{n-1})\leq \mathcal{E}(x)+ c(x,x_{n-1})-(1+\mu)c(x,x_{n})
x_{n} \in\displaystyle\operatorname*{argmin}_{x\in X} \mathcal{E}(x)+c(x,x_{n-1})

Focus on implicit method \(\phi(x,y)=\mathcal{E}(x)+c(x,y)\)

⏵ Unique argmins

⏵ \(c\) satisfies \(\displaystyle\liminf_{t\to 0}\frac{c(x(t),x(0))}{t}=0.\)

1+\mu=(1-\lambda)^{-1}

EVIs in NNCC spaces

Then EVI.

T H E O R E M   (L–Todeschi–Vialard '24)

(EVI)

\mathcal{E}(x_n)+c(x_n,x_{n-1})\leq \mathcal{E}(x)+ c(x,x_{n-1})-(1+\mu)c(x,x_{n})
x_{n} \in\displaystyle\operatorname*{argmin}_{x\in X} \mathcal{E}(x)+c(x,x_{n-1})

Focus on implicit method \(\phi(x,y)=\mathcal{E}(x)+c(x,y)\)

⏵ Unique argmins

⏵ \(c\) satisfies \(\displaystyle\liminf_{t\to 0}\frac{c(x(t),x(0))}{t}=0.\)

1+\mu=(1-\lambda)^{-1}

Sublinear (\(\mu=0\)) and linear (\(\mu>0\)) convergence rates

⏵ \((X\times X,c)\) NNCC space 

⏵ \(\mathcal{E}(\cdot)-\mu\,c(\cdot,x_n)\) convex on generalized c-segments \((x(t),x_{n-1})\)

 

 

Examples of NNCC spaces

\(X\), \(Y\) Polish spaces, \(c\in C(X\times Y)\).

If \((X\times Y,c)\) is an NNCC space then so is \((\mathcal{P}(X)\times \mathcal{P}(Y), \mathcal{T}_c)\).

Corollary: \((\mathcal{P}_2(X)\times \mathcal{P}_2(X), W_2^2)\) is an NNCC space when \(X=\)

\[\mathcal{T}_c(\mu,\nu)=\inf_{\pi\in\Pi(\mu,\nu)}\int c(x,y)\,d\pi\]

T H E O R E M   (L–Todeschi–Vialard '24)

Generalized c-segments \((\mu(t),\nu)\):

  ⏵ \((T_0,S)\) optimal coupling of \((\mu_0,\nu)\) 

  ⏵ \((T_1,S)\) optimal coupling of \((\mu_1,\nu)\)

  ⏵ \(\forall \omega\in\Omega\), \(t\mapsto (T_t(\omega),S(\omega))\) c-segment

  ⏵ \(\mu(t)=(T_t)_\#\mathbb{P}\)

  • \(\mathbb{R}^d\)
  • the sphere
  • Bures–Wasserstein...

\(\nu\)

\(\mu\)

Examples of NNCC spaces

Bures–Wasserstein

Gromov–Wasserstein        \(\mathbf{X}=[X,f,\mu]\)   and   \(\mathbf{Y}=[Y,g,\nu]\)

\[\operatorname{GW}^2(\mathbf{X},\mathbf{Y})=\inf_{\pi\in\Pi(\mu,\nu)}\int\lvert f(x,x')-g(y,y')\rvert^2\,d\pi(x,y)\,d\pi(x',y')\,.\]

Unbalanced OT

\textbf{X}(t)=[X_0\times X_1,\,\, (1-t)f_0+t\,f_1, \,\, (T_0,T_1)_\#\mathbb{P}].

Hellinger, Fisher–Rao

\operatorname{BW}^2(\Sigma_1,\Sigma_2) = \operatorname{tr}(\Sigma_1) + \operatorname{tr}(\Sigma_2) - 2 \operatorname{tr}\left(\sqrt{\Sigma_1^{1/2}\Sigma_2\Sigma_1^{1/2}}\right)

Thank you!