An intrinsic geometry for alternating minimization
Flavien Léger
joint work with Pierre-Cyril Aubin-Frankowski

1. Alternating minimization
2. The Kim–McCann geometry
3. Applications
Outline
For each x∈X: y↦ϕ(x,y) has a unique minimizer.
For each y∈Y: x↦ϕ(x,y) has a unique minimizer.
Alternating minimization
x∈X,y∈Yminimizeϕ(x,y)
A L G O R I T H M
X,Y two sets,
ϕ:X×Y→R
Assumptions:
Convergence rates: typically Euclidean space, ϕ convex and L-smooth
(Beck–Tetruashvili ’13, Beck ’15)
Motivations
Expectation–Maximization in statistics
Sinkhorn (aka RAS, IPFP) for matrix scaling and optimal transport
Projection Onto Convex Sets

model pθ
X,Y: two convex subsets of Rd
Motivations 2: “Gradient descent” family
F(x)=y∈Yinfϕ(x,y)
X,Y two smooth manifolds
Let λ≥0. ϕ has the λ-strong FPP if
ϕ(x,y1)+(1−λ)ϕ(x0,y0)≤ϕ(x,y)+(1−λ)ϕ(x,y0)
(λ-FPP)
D E F I N I T I O N
The five-point property
inspired by Csiszár–Tusnády ’84
Characterizes (AM):
Intrinsic, no regularity on X,Y,ϕ,R
If ϕ satisfies the FPP then
ϕ(xn,yn)≤ϕ(x,y)+nϕ(x,y0)−ϕ(x0,y0)
If ϕ satisfies the λ-FPP then
ϕ(xn,yn)≤ϕ(x,y)+Λn−1λ[ϕ(x,y0)−ϕ(x0,y0)],
where Λ:=(1−λ)−1>1.
T H E O R E M (FL–PCAF '23)
Convergence rates
Proof.
(FPP) ⟺
ϕ(xn+1,yn+1)≤ϕ(x,y)+[ϕ(x,yn)−ϕ(xn,yn)]−[ϕ(x,yn+1)−ϕ(xn+1,yn+1)]
Sum from 0 to n−1 implies
nϕ(xn,yn)≤nϕ(x,y)+[ϕ(x,y0)−ϕ(x0,y0)]−[ϕ(x,yn)−ϕ(xn,yn)]
🤔
ϕ(x0,y0)≤ϕ(x,y)+ϕ(x,y0)−ϕ(x,y1)
Transition to geometry
How to obtain the FPP
Answer: when X,Y smooth manifolds, find a path (x(t),y(t)) joining (x0,y1) to (x,y) such that
b(t)=ϕ(x(t),y(t))+ϕ(x(t),y0)−ϕ(x(t),y1) is convex.
Why: special structure of the FPP
(FPP) ⟺b(0)≤b(1)
and b′(0)=0
1. Alternating minimization
2. The Kim–McCann geometry
3. Applications



Pseudo-Riemannian metric on X×Y
(Kim–McCann ’10)
D E F I N I T I O N : The Kim–McCann metric
gKM=21(0−∇xy2c(x,y)−∇xy2c(x,y)0)
δc(x+ξ,y+η;x,y)=Kim–McCann metric (’10)−∇xy2c(x,y)(ξ,η)+o(∣ξ∣2+∣η∣2)
δc(x′,y′;x,y)=
[c(x,y′)+c(x′,y)]−[c(x,y)+c(x′,y′)]
X,Y: d-dimensional smooth manifolds
c∈C4(X×Y)
➡ c-segments: Kim–McCann geodesics (x(t),y)
➡ cross-curvature: curvature of the Kim–McCann metric (aka MTW tensor)
T H E O R E M (Kim–McCann '11)
Under some assumptions on (X,Y,c),
nonnegative cross-curvature ⟺t↦c(x(t),y)−c(x(t),y′) is convex for any c-segment (x(t),y).
c-segments and cross-curvature
A local criteria for the five-point property
Suppose that c has nonnegative cross-curvature.
T H E O R E M (FL–PCAF '23)
X,Y: d-dimensional smooth manifolds
c∈C4(X×Y),g∈C1(X),h∈C1(Y)
If F(x):=infy∈Yϕ(x,y) is convex on every c-segment t↦(x(t),y) satisfying x(0)=argminx∈Xϕ(x,y), then ϕ satisfies the FPP.
"... F(x)−λϕ(x,y) ..." ⇝ λ-FFP.
Intrinsic: c-segments and F.
1. Alternating minimization
2. The Kim–McCann geometry
3. Applications
Riemannian/metric space setting
da Cruz Neto, de Lima, Oliveira ’98
Bento, Ferreira, Melo ’17
2. Explicit: ϕ(x,y)=2τ1dM2(x,y)+h(y),f(x)=infyϕ(x,y)
xn+1=expxn(−τ∇f(xn))
Riem≥0: ∇2f≥0 gives O(1/n) convergence rates
Riem≤0: if dM2(x,y) has nonpositive cross-curvature then convexity of f on c-segments gives O(1/n) convergence rates
Riemannian manifold X=Y=M
1. Implicit: ϕ(x,y)=2τ1dM2(x,y)+f(x)
xn+1=xargminf(x)+2τ1dM2(x,xn)
Riem≤0: ∇2f≥0 gives O(1/n) convergence rates
Riem≥0: if dM2(x,y) has nonnegative cross-curvature then convexity of f on c-segments gives O(1/n) convergence rates
Wasserstein gradient flows, generalized geodesics (Ambrosio–Gigli–Savaré '05)
Global rates for Newton's method
Newton's method: new global convergence rate.
New condition on F similar but different from self-concordance
T H E O R E M (FL–PCAF '23)
If F is convex on the paths x(t)=(∇u)−1(y(t)) with y(t) standard segments, then
F(xn)≤F(x)+nu(x0∣x)
⟶ Natural gradient descent:
xn+1−xn=−∇2u(xn)−1∇F(xn)
Thank you!
Reference:
Gradient descent with a general cost. Flavien Léger and Pierre-Cyril Aubin-Frankowski. arXiv:2305.04917, 2023
(SIGMA-MODE 2024-01-30) A geometry for alternating minimization
By Flavien Léger
(SIGMA-MODE 2024-01-30) A geometry for alternating minimization
- 434