In brief: Gradient flow horizontally or vertically
(More info: Modin, 2017 and references therein)
\(\mu_0\) and \(\mu_1\) are both (zero-mean) normal distributions on \(\mathbb{R}^n\).
Normal distributions \(\cong\) \(P(n)\), positive-definite symmetric matrices
In brief: Gradient flow horizontally or vertically
E.J, K. Modin, Convergence of the vertical gradient flow for the Gaussian Monge problem J. Comput. Dyn. (accepted), 2023
How to prove convergence?
Idea: Show \(\frac{\mathrm d} {\mathrm d t} J \to 0\), and that this means we hit polar cone
Convergence rate in linear case?
Random matrices with known factorization \(A = PU\), distance to \(B\) from \(P\).
Interesting for other, similar matrix flows.
Gaussian case: pre-study for more work into the gradient flows in the infinite-dimensional case?
Decompose a matrix into one orthogonal part and one symmetric p.d. part
Decompose a matrix into one orthogonal part and one symmetric p.d. part
Easy to do in for instance python
import numpy as np
from scipy.linalg import polar
a = np.array([[1, -1], [2, 4]])
u, p = polar(a, 'left')Algorithm is based on SVD factorization, ~0.01 ms (including overhead)
Matrix ordinary differential equation!
In the end: \(B(\infty) = ????\)
To compute the polar decomposition of a matrix \(A\):
Take a known and fixed symmetric and positive definite matrix \(\Sigma_0\) and solve the following matrix ODE until \(t = \infty\):
import numpy as np
from scipy.linalg import solve_sylvester,expm, polar
A = np.array([[1, 1], [2, 4]])
Sigma0 = np.eye(2)
Sigma1 = A@Sigma0@A.T
T,h = 60,0.1 #Integration params: final time, step size
B = A
for _ in range(int(T/h)):
Omega = solve_sylvester(Sigma1,Sigma1,2*Sigma1@(np.linalg.inv(B)-np.linalg.inv(B).T))
B = expm(h*Omega)@BMuch more complicated! solve_sylvester hides stuff, expm is expensive. Total time: ~ 300 ms
Interest lies in how the method arises.
Questions I think I should have raised:
The answer:
\(\mu_0\) and \(\mu_1\) are both (zero-mean) normal distributions on \(\mathbb{R}^n\): Parametrized by choice of p.d. symmetric covariance matrix
The statistical manifold of normal distributions is the set \(P(n)\) of positive-definite symmetric matrices
(REFERENCE: INFORMATION GEOMETRY, AMARI)
Geometry? You work with manifolds and stuff... \(\operatorname{GL}(n)\) is a manifold.
Manifolds have tangent spaces
(they contain tangent vectors!)
Manifolds can be equipped with Riemannian metrics (generalizations of Euclidean i.p.) that are inner products on each tangent space
Let's put a metric on \(\operatorname{GL}(n)\)
The metric induces a distance function on \(\operatorname{GL}(n)\)
By a fantastic coincidence, \(J(A) = d^2(I,A)\)
The solution to the GaussMP is known!
By BreniƩr's theorem, the solution of the GaussMP is the positive-definite symmetric part of the polar decomposition!
The Fiber above the \(\Sigma_0\) is \(\operatorname{O}(\Sigma_0,n)\)
The tangent spaces of \(\operatorname{GL}(n)\) splits into vertical component along fiber and horizontal perpendicular to fiber
The polar cone is all the horizontal geodesics connected to the identity. The polar cone is isomorphic to \(P(n)\)
Theorem: there is a unique element of the polar cone in \(\pi^{-1}(\Sigma_1)\) This is the solution to the OT problem.
Use gradient flows to minimze \(J(A)\)! Restrict metric to \(\pi^{-1}(\Sigma_1)\), and compute
\(\dot B = -\nabla_{\mathcal G|_{\pi^{-1}(\Sigma_1)}} J(B), B(0) = A \)
Other GFs are available, along the polar cone, directly in the covariance matrices, etc.
How to prove that this converges to \(P\)?
The geometric structure is the same in the general OT case. Corresponding flow can be derived!