How to compute the polar factorization of a matrix in a way you shouldn't

The Monge problem

\mathbb{R}^n

\mu_0

\mu_1

\varphi_* \mu_0

\min_{\varphi \in \operatorname{Diff}(\mathbb{R}^n)} J(\varphi) = \int_{\mathbb{R}^n} |\varphi(x)-x|^2 \mathrm{d}\mu_0(x) \\ \text{s.t. } \varphi \in C(\mu_0,\mu_1) = \{\varphi \in \operatorname{Diff}(\mathbb R^n)| \varphi_* \mu_0 = \mu_1\}

Geometric structure

\operatorname{Diff}(\mathbb R^n)/\operatorname{Diff}_{\mu_0}(\mathbb R^n)\\ \cong \operatorname{Dens}(\mathbb R^n)

In brief: Gradient flow horizontally or vertically

\mu_0

\mu_1

\text{Id}

\operatorname{Dens}(\mathbb{R}^n)

\operatorname{Diff}(\mathbb{R}^n)

(More info: Modin, 2017 and references therein)

The Gaussian Monge problem

\(\mu_0\) and \(\mu_1\) are both (zero-mean) normal distributions on \(\mathbb{R}^n\).

Normal distributions \(\cong\) \(P(n)\), positive-definite symmetric matrices

\mu_0

\mu_1

\mathcal{N}(0,\Sigma_0)

\mathcal{N}(0,\Sigma_1)

\min J (A ) = \operatorname{Tr}((I - A )\Sigma_0 (I - A )^T ) \\ \text{s.t. } A\Sigma_0 A^T = \Sigma_1

Geometric structure

\operatorname{GL}(n)/\operatorname{O}(\Sigma_0,n)\\ \cong \operatorname{P}(n)

\Sigma_0

\Sigma_1

\text{Id}

\operatorname{P}(n)

\operatorname{GL}(n)

In brief: Gradient flow horizontally or vertically

Vertical matrix flow

\Sigma_0

\Sigma_1

\text{Id}

E.J, K. Modin, Convergence of the vertical gradient flow for the Gaussian Monge problem J. Comput. Dyn. (accepted), 2023

How to prove convergence?

Idea: Show \(\frac{\mathrm d} {\mathrm d t} J \to 0\), and that this means we hit polar cone

\dot B = \Omega B, B(0) =A \\ \Sigma_1 \Omega + \Omega_1 \Sigma = 2\Sigma_1 (B^{-1}-B^{-T}) \,

Questions!

Convergence rate in linear case?

Random matrices with known factorization \(A = PU\), distance to \(B\) from \(P\).

Interesting for other, similar matrix flows.

Questions!

Gaussian case: pre-study for more work into the gradient flows in the infinite-dimensional case?

Existence? Convergence to minimizer?
Discretization?

\dot \eta+ 2\eta^{-1}+2\operatorname{Id}= -\nabla p \circ \eta \\ \nabla \cdot \rho_1 \nabla p = \nabla\cdot \rho_1 (\text{id}-\eta^{-1})

Polar decompositions

\begin{pmatrix} 1 & 1 \\ 2 & 4 \end{pmatrix}

Decompose a matrix into one orthogonal part and one symmetric p.d. part

Polar decompositions

Decompose a matrix into one orthogonal part and one symmetric p.d. part

\begin{pmatrix} 1 & 1 \\ 2 & 4 \end{pmatrix} = {\color{yellow}\begin{pmatrix} 0.78 &1.18 \\ 1.18 & 4.31 \end{pmatrix}}{\color{cyan} \begin{pmatrix} 0.98 & -0.20 \\ 0.20 & 0.98 \end{pmatrix}}

\begin{pmatrix} 0.98 & -0.20 \\ 0.20 & 0.98 \end{pmatrix} \begin{pmatrix} 0.98 & -0.20 \\ 0.20 & 0.98 \end{pmatrix}^\perp = \begin{pmatrix} 1 & 0 \\ 0 & 1\end{pmatrix}

\begin{pmatrix} 0.78 &1.18 \\ 1.18 & 4.31 \end{pmatrix}^T {=} \begin{pmatrix} 0.78 &1.18 \\ 1.18 & 4.31 \end{pmatrix}, \\\text{eig}\begin{pmatrix} 0.78 &1.18 \\ 1.18 & 4.31 \end{pmatrix} = \begin{pmatrix} 0.43 & 4.67 \end{pmatrix}

Polar decompositions

Easy to do in for instance python

import numpy as np
from scipy.linalg import polar
a = np.array([[1, -1], [2, 4]])
u, p = polar(a, 'left')

Algorithm is based on SVD factorization, ~0.01 ms (including overhead)

Polar decompositions the hard way

\dot B(t) = \Omega(t) B(t), B(0) =A \\ \Sigma_1 \Omega(t) + \Omega(t) \Sigma_1 = 2\Sigma_1 (B(t)^{-1}-B(t)^{-T}) \,\\ \Sigma_1 = \pi(A) = A \Sigma_0 A^T

Matrix ordinary differential equation!

Plug into favorite matrix ODE solving algorithm that preserves structure (expensive)
Run flow "until convergence"

In the end: \(B(\infty) = ????\)

To compute the polar decomposition of a matrix \(A\):

Take a known and fixed symmetric and positive definite matrix \(\Sigma_0\) and solve the following matrix ODE until \(t = \infty\):

The algorithm: implemented

import numpy as np 
from scipy.linalg import solve_sylvester,expm, polar
A = np.array([[1, 1], [2, 4]])
Sigma0 = np.eye(2)
Sigma1 = A@Sigma0@A.T
T,h = 60,0.1 #Integration params: final time, step size
B = A 
for _ in range(int(T/h)):
    Omega = solve_sylvester(Sigma1,Sigma1,2*Sigma1@(np.linalg.inv(B)-np.linalg.inv(B).T))
    B = expm(h*Omega)@B

Much more complicated! solve_sylvester hides stuff, expm is expensive. Total time: ~ 300 ms

The algorithm: i n a c t i o n

Slow! When size grows, algo slows :(
When has it converged enough?
Much superior and optimized algos available

Why is it still interesting?

So, the algo is crap
But still, the thesis work I found the nicest and most rewarding

Interest lies in how the method arises.

Questions I think I should have raised:

How did you come up with that?
Why does it converge to the symmetric factor like that?

OPTIMAL TRANSPORT

The answer:

\dot B = \Omega B, B(0) =A \\ \Sigma_1 \Omega + \Omega \Sigma_1 = 2\Sigma_1 (B^{-1}-B^{-T}) \,

Optimal transport

\mathbb{R}^n

\mu_0

\mu_1

\eta_* \mu_0

\, \min_{\eta \in \operatorname{Diff}(\mathbb{R}^n)} J(\eta) = \int_{\mathbb{R}^n} |\eta(x)-x|^2 \mathrm{d}\mu_0(x) \, \\ \text{s.t. } \eta^* \mu_0 = \mu_1

The Gaussian Monge Problem

\(\mu_0\) and \(\mu_1\) are both (zero-mean) normal distributions on \(\mathbb{R}^n\): Parametrized by choice of p.d. symmetric covariance matrix

\mu_0

\mu_1

\mathcal{N}(0,\Sigma_0)

\mathcal{N}(0,\Sigma_1)

The statistical manifold of normal distributions is the set \(P(n)\) of positive-definite symmetric matrices

\varphi(x) = Ax,\\ A \in \operatorname{GL}(n) \\ \varphi_* \mu_0 \sim \mathcal{N}(0, A \Sigma_0 A^T)

(REFERENCE: INFORMATION GEOMETRY, AMARI)

The Gaussian Monge Problem

\mu_0

\mu_1

\mathcal{N}(0,\Sigma_0)

\mathcal{N}(0,\Sigma_1)

\varphi(x) = Ax,\\ A \in \operatorname{GL}(n) \\ \varphi_* \mu_0 \sim \mathcal{N}(0, A \Sigma_0 A^T)

\min J(A) =\operatorname{Tr}(\Sigma_0 (I-A)^T (I-A)) \\ \text{s.t. } A \in C(\Sigma_0,\Sigma_1) = \{A \in\operatorname{GL}(n): A\Sigma_0 A^T = \Sigma_1 \}

The geometry of the GaussMP

Geometry? You work with manifolds and stuff... \(\operatorname{GL}(n)\) is a manifold.

Manifolds have tangent spaces

(they contain tangent vectors!)

Manifolds can be equipped with Riemannian metrics (generalizations of Euclidean i.p.) that are inner products on each tangent space

The geometry of the GaussMP

Let's put a metric on \(\operatorname{GL}(n)\)

\mathcal{G}_A(v,w) = \operatorname{Tr}(\Sigma_0 v^T w)

The metric induces a distance function on \(\operatorname{GL}(n)\)

d(A,B) = \operatorname{Tr}(\Sigma_0 (A-B)^T (A-B))^{1/2}

By a fantastic coincidence, \(J(A) = d^2(I,A)\)

The solution to the GaussMP is known!

The geometry of the GaussMP

By Breniér's theorem, the solution of the GaussMP is the positive-definite symmetric part of the polar decomposition!

A = PQ, \Sigma_1 = \pi(A) = A \Sigma_0 A^T \\ P(n) \ni P = \argmin_{A \in C(\Sigma_0,\Sigma_1)} J(A), \\ Q \in \operatorname{O}(n,\Sigma_0) = \{Q \in \operatorname{GL}(n), Q\Sigma_0 Q^T = \Sigma_0 \}

The geometry of the GaussMP

\(\operatorname{GL}(n)\) is a Lie group
It acts on \(P(n)\) by \(A.\Sigma = A \Sigma A^T\)
1. This action is transitive, i.e., any two matrices in \(P(n)\) can be connected by one in \(\operatorname{GL}(n)\)
2. The action thus defines a projection \(\pi(A) = A \Sigma_0 A^T \in P(n) \)
The Lie subgroup \(\operatorname{O}(n,\Sigma_0)\) are the matrices that leave \(\Sigma_0\) unchanged, i.e., \(\pi(Q) = \Sigma_0\)
Finally: \(\operatorname{O}(n,\Sigma_0)\) is isomorphic to \(C(\Sigma_0,\Sigma_1)\)

This is what in the business is known as a principal \(\operatorname{O}(n,\Sigma_0)\)-bundle, with fibers \(\pi^{-1}(\Sigma) \)

The geometry of the GaussMP

\operatorname{GL}(n)/\operatorname{O}(\Sigma_0,n)\\ \cong \operatorname{P}(n)

\Sigma_0

\Sigma_1

\text{Id}

\operatorname{P}(n)

\operatorname{GL}(n)

The Fiber above the \(\Sigma_0\) is \(\operatorname{O}(\Sigma_0,n)\)

The tangent spaces of \(\operatorname{GL}(n)\) splits into vertical component along fiber and horizontal perpendicular to fiber

K_\diamond

The polar cone is all the horizontal geodesics connected to the identity. The polar cone is isomorphic to \(P(n)\)

Theorem: there is a unique element of the polar cone in \(\pi^{-1}(\Sigma_1)\) This is the solution to the OT problem.

Finding the matrix ODE

\Sigma_0

\Sigma_1

\text{Id}

Use gradient flows to minimze \(J(A)\)! Restrict metric to \(\pi^{-1}(\Sigma_1)\), and compute

\(\dot B = -\nabla_{\mathcal G|_{\pi^{-1}(\Sigma_1)}} J(B), B(0) = A \)

\dot B = \Omega B, B(0) =A \\ \Sigma_1 \Omega + \Omega_1 \Sigma = 2\Sigma_1 (B^{-1}-B^{-T}) \,

Other GFs are available, along the polar cone, directly in the covariance matrices, etc.

Finding the matrix ODE

\Sigma_0

\Sigma_1

\text{Id}

\dot B = \Omega B, B(0) =A \\ \Sigma_1 \Omega + \Omega_1 \Sigma = 2\Sigma_1 (B^{-1}-B^{-T}) \,

How to prove that this converges to \(P\)?

Prove that the flow exists for all time
Prove that as \(t \to \infty\), \(\frac{\mathrm d }{\mathrm d t} J(B) \to 0\),
Prove that this in turn is equivalent to \(B(\infty) \in K_\diamond\)
Use that the intersection of fiber and \(K_\diamond\) only contains one element!

Just a theoretical exercise?

\Sigma_0

\Sigma_1

\text{Id}

The geometric structure is the same in the general OT case. Corresponding flow can be derived!

\dot \eta+ 2\eta^{-1}+2\operatorname{Id}= -\nabla p \circ \eta \\ \nabla \cdot \rho_1 \nabla p = \nabla\cdot \rho_1 (\text{id}-\eta^{-1})

Existence? Convergence to minimizer?
Discretization?