Cross-curvature: new areas of applications

Flavien Léger

joint works with François-Xavier Vialard, Pierre-Cyril Aubin-Frankowski

1. Gradient descent

2. The Laplace method

Outline

1. Gradient descent

Gradient descent as minimizing movement

Two steps:

1) majorize: find the tangent parabola (“surrogate”)

2) minimize: minimize the surrogate

\[x_{n+1}=x_n-\frac{1}{L}\nabla f(x_n),\]

objective function $f\colon \mathbb{R}^d\to\mathbb{R}$

$f$ is $L$-smooth if \[\nabla^2f\leq L I_{d\times d}\]

D E F I N I T I O N

\[f(x)\]

\[\leq\]

\[f(x_n)+\langle\nabla f(x_n),x-x_n\rangle+\frac{L}{2}\lVert x-x_n\rVert^2\]

The $y$-step and the $x$-step

The $y$-step (“majorize”):

\[y_{n+1} = \argmin_{y}\phi(x_n,y)\]

The $x$-step (”minimize”):

\[x_{n+1} = \argmin_{x}\phi(x,y_{n+1})\]

Family of majorizing functions $\phi(x,y)$

$\phi(\cdot,y_{n+1})$

General cost

\[f^c(y)=\inf\{\lambda\in\mathbb{R} : \forall x\in\mathbb{R}^d, \,f(x)\le c(x,y)+\lambda\}\]

Given: $X$ and $f\colon X\to\mathbb{R}$

Choose: $Y$ and $c(x,y)$

$c(\cdot,y)+f^c(y)$

$c(\cdot,y)+\lambda$

\[f(x)\leq \underbrace{c(x,y)+f^c(y)}_{\phi(x,y)}\]

$f$ is $c$-concave if

\[f(x)=\inf_{y\in Y}c(x,y)+f^c(y)\]

Envelope of the surrogates

D E F I N I T I O N

\[\inf_{x\in X} f(x)=\inf_{x\in X}\inf_{y\in Y} c(x,y)+f^c(y)=\phi(x,y)\]

\[\inf_{x}f(x)=\inf_{x,y}c(x,y)+f^c(y)\]

Gradient descent with a general cost

(FL–PCAF '23)

\begin{aligned} y_{n+1} &= \argmin_{y\in Y} c(x_n,y)+f^c(y)\\ x_{n+1} &= \argmin_{x\in X} c(x,y_{n+1})+f^c(y_{n+1}) \end{aligned}

\begin{aligned} -\nabla_xc(x_n,y_{n+1})&=-\nabla f(x_n)\\ \nabla_xc(x_{n+1},y_{n+1})&=0 \end{aligned}

“majorize”

“minimize”

A L G O R I T H M

$\phi(\cdot,y_{n+1})$

Examples

$\bullet \,\,\,c(x,y)=\overbrace{u(x)-u(y)-\langle\nabla u(y),x-y\rangle}^{\eqqcolon u(x|y)}$: mirror descent

\[\nabla u(x_{n+1})-\nabla u(x_n)=-\nabla f(x_n)\]

$\bullet\,\,\,c(x,y)=u(y|x)$: natural gradient descent

\[x_{n+1}-x_n=-\nabla^2 u(x_n)^{-1}\nabla f(x_n)\]

$\bullet\,\,\,c(x,y)=\frac{L}{2}d_M^2(x,y)$: Riemannian gradient descent

\[x_{n+1}=\exp_{x_n}(-\frac{1}{L}\nabla f(x_n))\]

Newton

$-\nabla_xc(x,y)=\xi\Leftrightarrow y=\exp_x(\frac{1}{L}\xi)$

\begin{aligned} y_{n+1} &= \argmin_{y\in Y} \phi(x_n,y)\\ x_{n+1} &= \argmin_{x\in X} \phi(x,y_{n+1}) \end{aligned}

$X,Y$, $\phi\colon X\times Y\to\mathbb{R}$

Alternating minimization

$\phi(x,y)=c(x,y)+g(x)+h(y)$

\[\operatorname*{minimize}_{x\in X,y\in Y} \;\phi(x,y)\]

A L G O R I T H M

The five-point property

For all $x,y,y_0,x_{1},y_{1}$,

\[\phi(x,y_{1})+(1-\lambda)\phi(x_0,y_0)\leq \phi(x,y)+(1-\lambda)\phi(x,y_0)\]

They show: $\phi(x_n,y_n)\to\inf \phi$

(FPP)

D E F I N I T I O N (Csiszár–Tusnády ’84 ($\lambda=0$))

\[\operatorname*{minimize}_{x\in X,y\in Y} \;\phi(x,y)\]

If $\phi$ satisfies the FPP then

\[\phi(x_n,y_n)\leq \phi(x,y)+\frac{\phi(x,y_0)-\phi(x_0,y_0)}{n}\]

If $\phi$ satisfies the $\lambda$-FPP then

\[\phi(x_n,y_n)\leq \phi(x,y)+\frac{\lambda[\phi(x,y_0)-\phi(x_0,y_0)]}{\Lambda^n-1},\]

where $\Lambda\coloneqq(1-\lambda)^{-1}>1$.

T H E O R E M (FL–PCAF '23)

The Kim–McCann geometry

$\delta_c(x+\xi,y+\eta;x,y)=\underbrace{-\nabla^2_{xy}c(x,y)(\xi,\eta)}_{\text{Kim--McCann metric ('10)}}+o(\lvert\xi\rvert^2+\lvert\eta\rvert^2)$

$\delta_c(x',y';x,y)=$

\[W_c(\mu,\nu)=\inf_{\pi\in\Pi(\mu,\nu)}\iint_{X\times Y}c(x,y)\,\pi(dx,dy)\]

$[c(x,y')+c(x',y)]-[c(x,y)+c(x',y')]$

Cross-curvature

The cross-curvature or Ma–Trudinger–Wang tensor is

\[\mathfrak{S}_c(\xi,\eta)=(c_{ik\bar s} c^{\bar s t} c_{t \bar\jmath\bar\ell}-c_{i\bar \jmath k\bar\ell}) \xi^i\eta^{\bar\jmath}\xi^k\eta^{\bar\ell}\]

(Ma–Trudinger–Wang ’05)

\[c_{i\bar \jmath}=\frac{\partial^2c}{\partial x^i\partial y^{\bar\jmath}},\dots\]

$\mathfrak{S}_c$ uniquely determines the curvature of the Kim–McCann metric.

D E F I N I T I O N

T H E O R E M (Kim–McCann '11)

\[\mathfrak{S}_c\geq 0 \iff c(x(t),y)-c(x(t),y')\text{ convex in } t\]

for any Kim–McCann geodesic $t\mapsto (x(t),y)$

A local criteria for the five-point property

Suppose that $c$ has nonnegative cross-curvature.

If $F(x)\coloneqq\inf_{y\in Y}\phi(x,y)$ is convex on every Kim–McCann geodesic $t\mapsto (x(t),y)$ satisfying $\nabla_x\phi(x(0),y)=0$, then $\phi$ satisfies the FPP.

"... $F(x)-\lambda\phi(x,y)$ ..." $\leadsto$ $\lambda$-FFP.

\[\phi(x,y)=c(x,y)+g(x)+h(y)\]

T H E O R E M (FL–PCAF '23)

Application: Newton's method

$c(x,y)=u(y|x)\longrightarrow$ NGD

\[x_{n+1}-x_n=-\nabla^2u(x_n)^{-1}\nabla f(x_n)\]

If \[\nabla^3u(\nabla^2u^{-1}\nabla f,-,-)\leq \nabla^2f\leq \nabla^2u+\nabla^3u(\nabla^2u^{-1}\nabla f,-,-)\] then

\[f(x_n)\leq f(x)+\frac{u(x_0|x)}{n}\]

Newton's method: new global convergence rate.

New condition on $f$ similar but different from self-concordance

T H E O R E M

Riemannian/metric space setting

$c(x,y)=\frac{1}{2\tau} d^2(x,y)$

2. Implicit: $x_{n+1}=\argmin_{x} f(x)+\frac{1}{2\tau}d^2(x,x_n)$

$R\leq 0$: $\nabla^2f\geq 0$ gives $O(1/n)$ convergence rates

$R\geq 0$: if $\mathfrak{S}_c\geq 0$ then convexity of $f$ on Kim–McCann geodesics gives $O(1/n)$ convergence rates

Wasserstein gradient flows, generalized geodesics (Ambrosio–Gigli–Savaré '05)

da Cruz Neto, de Lima, Oliveira ’98

Bento, Ferreira, Melo ’17

1. Explicit: $x_{n+1}=\exp_{x_n}\big(-\tau\nabla f(x_n)\big)$

$R\geq 0$: (smoothness and) $\nabla^2f\geq 0$ gives $O(1/n)$ convergence rates

$R\leq 0$: ? (nonlocal condition)

\[\operatorname*{minimize}_{x\in M} f(x)\]

2. The Laplace method

Motivation

Heat kernel asymptotics on $(M,g)$

Inverse problems

\[y = T(x) + \sqrt{\varepsilon}\xi, \quad\xi\sim\mathcal{N}(0,I)\]

\[\partial_tf=\frac12\Delta f\]

\[f_t(x)=\int_Mp_t(x,y)f_0(y)\,dy\]

\[\int_M \frac{e^{-\frac{d(x,y)^2}{2t}}}{(2\pi t)^{d/2}}f_0(y)\,dy\]

Entropic transport

\[\pi_{\varepsilon}(dx,dy)=e^{-[c(x,y)-\varphi(x)-\psi(y)]/\varepsilon}\mu(dx)\nu(dy)\]

Law of $y$ given $x\propto$

\[e^{-\lVert y-T(x)\rVert^2/2\varepsilon}\]

Behavior as $\varepsilon\to 0$ of \[I(\varepsilon)=\iint_{X\times Y}e^{-u(x,y)/\varepsilon}\,dr(x,y)\]

$u$ vanishes on $\Sigma=\{(x,T(x)) : x\in X\}$

Setting

$X,Y$ $u\colon X\times Y\to\mathbb{R}$, $u(x,y)\geq 0$

I(\varepsilon)=\displaystyle\int_X \Big(\frac{1}{\sqrt{\det[u_{ij}]}}\Big[r + \varepsilon\Big(\frac 12 u^{ij}\partial_{ij}r-\frac 12 u_{jk\ell}u^{ij}u^{k\ell}\partial_ir \\ + \frac 18 ru_{ijk}u_{\ell mn}u^{ij}u^{k\ell}u^{mn}+\frac{1}{12} r u_{ijk}u_{\ell mn} u^{i\ell} u^{jm} u^{kn} - \frac 18 r u_{ijk\ell}u^{ij}u^{k\ell}\Big)\Big]_{(x,T(x))} + O(\varepsilon^2)

Standard Laplace's method:

$g_{\scriptscriptstyle\text{KM}}$ is Riemannian on $\Sigma$

$$-D_{xy}^2c(x,y)(\xi,\eta)\ge 0$$

The Kim–McCann geometry of $\Sigma$

$(X,Y,u)$

In summary, we have

On $X\times Y$

On $\Sigma$

Extrinsic curvatures

$\hat g_{\scriptscriptstyle\text{KM}}$ semi-metric

$\hat m$ volume form

$\hat \nabla$ Levi-Civita connection

$\hat R$ scalar curvature

$g_{\scriptscriptstyle\text{KM}}$ metric

$m$ volume form

$\nabla$ Levi-Civita connection

$R$ scalar curvature

$h$ second fundamental form

$H$ mean curvature

$$\iint_{X\times Y}\frac{e^{-u(x,y)/\varepsilon}}{(2\pi\varepsilon)^{d/2}}f(x,y)\,d\hat m(x,y) = \int_\Sigma fdm\,+$$

$$\varepsilon\int_\Sigma \bigg[-\frac 18\hat\Delta f+ \frac 14 \hat\nabla_{\!H} f+ f \Big( \frac{3}{32}\hat R-\frac18R+\frac{1}{24}\langle h,h\rangle-\frac18\langle H,H\rangle\Big)\bigg] \,dm$$

T H E O R E M

$$+O(\varepsilon^2)$$

Main result

Thank you!

(Milan 2023-10-11) Gradient descent and the Laplace method

By Flavien Léger

(Milan 2023-10-11) Gradient descent and the Laplace method

Cross-curvature: new areas of applications

1. Gradient descent

2. The Laplace method

Outline

1. Gradient descent

Gradient descent as minimizing movement

The \(y\)-step and the \(x\)-step

General cost

Gradient descent with a general cost

Examples

Alternating minimization

The five-point property

The Kim–McCann geometry

Cross-curvature

A local criteria for the five-point property

Application: Newton's method

Riemannian/metric space setting

2. The Laplace method

Motivation

Setting

Main result

Thank you!

(Milan 2023-10-11) Gradient descent and the Laplace method

More from Flavien Léger