Optimal transport, information, and matrix decompositions

Klas Modin

slides.com/kmodin/matrix-decompositions

Riemannian principal bundles

E/H\simeq B

\mathrm{Hor}

\hookleftarrow H

\downarrow

\pi

Invariant Riemannian metric on \(E\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

Riemannian principal bundles

E/H\simeq B

\mathrm{Hor}

\hookleftarrow H

\downarrow

\pi

Invariant Riemannian metric on \(E\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

Riemannian principal bundles

G/H\simeq B

\mathrm{Hor}

\hookleftarrow H

\downarrow

\pi

left co-sets \([g] = g\cdot H \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

Riemannian principal bundles

G/G_{b_0}\simeq B

\mathrm{Hor}

\hookleftarrow G_{b_0}

\downarrow

\pi(g) = g\cdot b_0

left co-sets \([g] = g\cdot G_{b_0} \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

b_0

G_{b_0}

Riemannian principal bundles

G_{b_0}\backslash G\simeq B

\mathrm{Hor}

G_{b_0}\hookrightarrow

\downarrow

\pi(g) = b_0\cdot g

right co-sets \([g] = G_{b_0}\cdot g \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

b_0

G_{b_0}

Riemannian principal bundles

G/G_{b_0}\simeq B

\text{horizontal flow}

\hookleftarrow G_{b_0}

\downarrow

\pi(g) = g\cdot b_0

left co-sets \([g] = g\cdot G_{b_0} \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

b_0

Riemannian principal bundles

G/G_{b_0}\simeq B

\text{vertical flow}

\hookleftarrow G_{b_0}

\downarrow

\pi(g) = g\cdot b_0

left co-sets \([g] = g\cdot G_{b_0} \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

b_0

Riemannian principal bundles

G/G_{b_0}\simeq B

g\in G

\hookleftarrow G_{b_0}

\downarrow

\pi(g) = g\cdot b_0

left co-sets \([g] = g\cdot G_{b_0} \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

b_0

b_1 = \pi(g) = g\cdot b_0

Riemannian principal bundles

G/G_{b_0}\simeq B

\gamma(1)\in G

\hookleftarrow G_{b_0}

\downarrow

\pi(g) = g\cdot b_0

left co-sets \([g] = g\cdot G_{b_0} \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

b_0

b_1 = \pi(g) = g\cdot b_0

\gamma'(t)\in \mathrm{Hor}

Riemannian principal bundles

G/G_{b_0}\simeq B

g = \gamma(1)(\gamma(1)^{-1}g)

\hookleftarrow G_{b_0}

\downarrow

\pi(g) = g\cdot b_0

left co-sets \([g] = g\cdot G_{b_0} \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

b_0

b_1 = \pi(g) = g\cdot b_0

\gamma'(t)\in \mathrm{Hor}

\overbrace{\phantom{klasklas}}^{\in G_{b_0}}

Riemannian principal bundles

G/G_{b_0}\simeq B

g = \gamma(1)h

\hookleftarrow G_{b_0}

\downarrow

\pi(g) = g\cdot b_0

left co-sets \([g] = g\cdot G_{b_0} \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

b_0

b_1 = \pi(g) = g\cdot b_0

\gamma'(t)\in \mathrm{Hor}

Riemannian principal bundles

G/G_{b_0}\simeq B

g = k h

\hookleftarrow G_{b_0}

\downarrow

\pi(g) = g\cdot b_0

left co-sets \([g] = g\cdot G_{b_0} \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

b_0

b_1 = \pi(g) = g\cdot b_0

polar cone

Riemannian principal bundles

G_{b_0}\backslash G\simeq B

g = h k

G_{b_0}\hookrightarrow

\downarrow

\pi(g) = b_0\cdot g

right co-sets \([g] = g\cdot G_{b_0} \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

b_0

b_1 = \pi(g) = b_0\cdot g

polar cone

Examples

Optimal mass transport
(infinite- and finite-dimensional)
\(\Rightarrow\) polar decomposition
Vertical flows: Euler equations, free rigid body, gradient OT flow
Horizontal flows: heat flow (entropy gradient flow)
Information geometry
\(\Rightarrow \; QR\), Cholesky, singular value, and spectral decompositions
Vertical flows: gradient flow for \(QR\), isospectral flows
Horizontal flows: Brocket flow (entropy gradient flow)

Optimal mass transport (OMT)

\mu_0

\mu_1

\varphi_*\mu_0

\displaystyle\min_{\varphi*\mu_0=\mu_1} \int_{M} d_M^2(\varphi(x),x ) \mu_0

Monge problem, \(L^2\) version

Optimal mass transport (OMT)

\mu_0

\mu_1

\varphi_*\mu_0

\displaystyle\min_{\varphi*\mu_0=\mu_1} \int_{\mathbb{R}^n} \lvert \varphi(x)-x \rvert^2 \mu_0

\mathbb{R}^n

Monge problem, \(L^2\) version

\underbrace{\phantom{klasklkllkklklas}}_{J(\varphi)}

Riemannian structure of OMT

\mathrm{Diff}(M)

\mathrm{Dens}(M)\simeq \mathrm{Diff}(M)/\mathrm{Diff}_{\mu_0}(M)

\mathrm{Id}

\mu_0

\mu_1

\pi(\varphi)=\varphi_*\mu_0

Riemannian metric

\displaystyle\mathcal{G}_\varphi(\dot\varphi,\dot\varphi) = \int_{M}\left\vert \dot\varphi \right\vert^2 \mu_0

Induced metric

\overline{\mathcal{G}}_\mu(\dot\mu,\dot\mu) = \int_M \lvert \nabla \theta\rvert^2 \mu

[Benamou & Brenier (2000), Otto (2001)]

Invariance: \(\eta\in\mathrm{Diff}_{\mu_0}(M)\)

\displaystyle\mathcal{G}_\varphi(\dot\varphi,\dot\varphi) = \mathcal{G}_{\varphi\circ\eta}(\dot\varphi\circ\eta,\dot\varphi\circ\eta)

\dot\rho + \operatorname{div}(\rho \nabla\theta) = 0, \; \rho = \mu/dx

Geodesics on \(\operatorname{Diff}(\mathbb{R}^n)\)

Geodesic equation:

\ddot\varphi = 0 \Rightarrow \varphi(t) = \mathrm{Id} + t\,v_0

\mathrm{Id}

\mu_0

\mu_1

\mathrm{Hor}_{\mathrm{Id}} = \nabla C^\infty(\mathbb{R}^n)

Easy to prove:

Polar cone \(K\) is isomorphic to strictly convex smooth functions via \(\phi \mapsto \nabla\phi\)

Hard to prove:

Polar cone \(K\) a section of principal bundle

Geodesics on \(\operatorname{Diff}(\mathbb{R}^n)\)

Geodesic equation:

\ddot\varphi = 0 \Rightarrow \varphi(t) = \mathrm{Id} + t\,v_0

\mathrm{Id}

\mu_0

\mu_1

\varphi = \nabla\phi\circ\eta, \; \eta\in \operatorname{Diff}_{\mu_0}(\mathbb{R}^n)

Easy to prove:

Polar cone \(K\) is isomorphic to strictly convex smooth functions via \(\phi \mapsto \nabla\phi\)

Hard to prove:

Polar cone \(K\) a section of principal bundle

Brenier's decomposition of transport maps

Geodesic distance on \(\operatorname{Diff}(\mathbb{R}^n)\)

Geodesic curve:

\varphi(t) = (1-t)\varphi_0 + t \varphi_1

\mathrm{Id}

\mu_0

\mu_1

\mathrm{dist}(\varphi_0,\varphi_1)^2 = \int_0^1 \mathcal{G}_{\varphi(t)}(\dot\varphi(t),\dot\varphi(t)) dt

= \int_{\mathbb{R}^n} \lvert \varphi_1(x)-\varphi_0(t) \rvert^2

= \int_0^1\int_{\mathbb{R}^n} \lvert \dot\varphi(t)\rvert^2\mu_0 dt

In particular:

J(\varphi) = \mathrm{dist}(\mathrm{Id},\varphi)^2

Monge-Ampere equation on \(\mathbb{R}^n\)

Geodesic curve:

\varphi(t) = \nabla( |x|^2/2 + t f )

\mathrm{Id}

\mu_0

\mu_1

(\nabla\phi)_*\mu_0 = \mu_1

In particular:

J(\varphi) = \mathrm{dist}(\mathrm{Id},\varphi)^2

\underbrace{\phantom{klaklklsklkasl}}_{\nabla\phi}

\displaystyle \Rightarrow \operatorname{det}(\nabla^2\phi) = \frac{\rho_0}{\rho_1\circ \nabla\phi}

Monge-Ampere equation on \(\mathbb{R}^n\)

Geodesic curve:

\varphi(t) = \nabla( |x|^2/2 + t f )

\mathrm{Id}

\mu_0

\mu_1

(\nabla\phi)_*\mu_0 = \mu_1

In particular:

J(\varphi) = \mathrm{dist}(\mathrm{Id},\varphi)^2

\underbrace{\phantom{klaklklsklkasl}}_{\nabla\phi}

\displaystyle \Rightarrow \operatorname{det}(\nabla^2\phi) = \frac{\rho_0}{\rho_1\circ \nabla\phi}

Linear optimal mass transport

Trivial observation: \(\varphi_0(x) = A_0 x\), \(\varphi_1(x) = A_1 x\) linear diffeomorphisms \(\Rightarrow\) geodesic consists of linear diffeomorphisms

Consequence: \(GL(n)\) is totally geodesic subgroup of \(\operatorname{Diff}(\mathbb{R}^n)\)

Corresponding subspace of densities (statistical submanifold): multivariate Gaussians with zero mean

\displaystyle \rho(x) = \frac{1}{\sqrt{(2\pi)^n\mathrm{det}(\Sigma)}}\mathrm{exp}(-\frac{1}{2}x^\top \Sigma^{-1}x)

\Sigma \in P(n)

Bundle structure

GL(n)

P(n)\simeq GL(n)/O(n,\Sigma_0)

\Sigma_0

\Sigma_1

\pi(A)=A\Sigma_0 A^\top

\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)

\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)

\dot\Sigma = S\Sigma + \Sigma S

Bundle structure

GL(n)

P(n)\simeq GL(n)/O(n,\Sigma_0)

\Sigma_0

\Sigma_1

\pi(A)=A\Sigma_0 A^\top

\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)

\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)

\dot\Sigma = S\Sigma + \Sigma S

K = P(n)

Bundle structure

GL(n)

P(n)\simeq GL(n)/O(n,\Sigma_0)

\Sigma_0

\Sigma_1

\pi(P)=P\Sigma_0 P = \Sigma_1

\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)

\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)

\dot\Sigma = S\Sigma + \Sigma S

K = P(n)

Monge-Ampere equation:

Bundle structure

GL(n)

P(n)\simeq GL(n)/O(n,\Sigma_0)

\Sigma_0

\Sigma_1

A = PQ

\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)

\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)

\dot\Sigma = S\Sigma + \Sigma S

K = P(n)

Factorization theorem:

Bundle structure

GL(n)

P(n)\simeq GL(n)/O(n,\Sigma_0)

\Sigma_0

\Sigma_1

\dot B = -\mathrm{Pr}\nabla_{\mathcal G}J(B), \; B(0) = A

\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)

\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)

\dot\Sigma = S\Sigma + \Sigma S

K = P(n)

Vertical gradient flow:

Bundle structure

GL(n)

P(n)\simeq GL(n)/O(n,\Sigma_0)

\Sigma_0

\Sigma_1

\dot B = \Omega B, \; B(0) = A

\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)

\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)

\dot\Sigma = S\Sigma + \Sigma S

K = P(n)

Vertical gradient flow:

\Sigma_1 \Omega + \Omega\Sigma_1 = 2\Sigma_1 (B^{-1}-B^{-\top})

Bundle structure

GL(n)

P(n)

\Sigma_0

\Sigma_1

\dot B = \Omega B, \; B(0) = A

\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)

\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)

\dot\Sigma = S\Sigma + \Sigma S

K = P(n)

Vertical gradient flow:

\Sigma_1 \Omega + \Omega\Sigma_1 = 2\Sigma_1 (B^{-1}-B^{-\top})

Bundle structure

GL(n)

P(n)

\Sigma_0

\Sigma_1

\dot \Sigma = \mathrm{Pr}\nabla_{\bar{\mathcal G}}H_{\Sigma_1}(\Sigma)

\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)

\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)

\dot\Sigma = S\Sigma + \Sigma S

K = P(n)

Horizontal gradient (heat) flow:

\Sigma(t)

\displaystyle H_{\Sigma_1}(\Sigma) = -\frac{1}{2}\mathrm{tr}(\Sigma_1^{-1}\Sigma) + \frac{1}{2}\log\det(\Sigma_1^{-1}\Sigma)

Relative entropy

(Kullback-Leibler)

Bundle structure

GL(n)

P(n)

\Sigma_0

\Sigma_1

\dot \Sigma = 2I - \Sigma_1^{-1}\Sigma - \Sigma\Sigma_1^{-1}

\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)

\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)

\dot\Sigma = S\Sigma + \Sigma S

K = P(n)

Horizontal gradient (heat) flow:

\Sigma(t)

\displaystyle H_{\Sigma_1}(\Sigma) = -\frac{1}{2}\mathrm{tr}(\Sigma_1^{-1}\Sigma) + \frac{1}{2}\log\det(\Sigma_1^{-1}\Sigma)

Relative entropy

(Kullback-Leibler)

Bundle structure

GL(n)

P(n)

\Sigma_0

\Sigma_1

\dot P = P^{-1}\Sigma_0^{-1} - \Sigma_1^{-1}P + V

\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)

\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)

\dot\Sigma = S\Sigma + \Sigma S

K = P(n)

Horizontal gradient (heat) flow:

P(t)

\displaystyle F(P) = H_{\Sigma_1}(P\Sigma_0 P)

Lifted gradient flow on \(K\) for

Bundle structure

GL(n)

P(n)

\Sigma_0

\Sigma_1

\dot P = P^{-1}\Sigma_0^{-1} - \Sigma_1^{-1}P + V

\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)

\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)

\dot\Sigma = S\Sigma + \Sigma S

K = P(n)

Horizontal gradient (heat) flow:

P(t)

\displaystyle F(P) = H_{\Sigma_1}(P\Sigma_0 P)

Lifted gradient flow on \(K\) for

Hessian of \(F(P)\) strictly positive on \(K\) \(\Rightarrow\) unique limit!

Bundle structure

GL(n)

P(n)

\Sigma_0

\Sigma_1

\dot P = P^{-1}\Sigma_0^{-1} - \Sigma_1^{-1}P + V

\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)

\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)

\dot\Sigma = S\Sigma + \Sigma S

K = P(n)

Horizontal gradient (heat) flow:

P(t)

\displaystyle F(P) = H_{\Sigma_1}(P\Sigma_0 P)

Lifted gradient flow on \(K\) for

Hessian of \(F(P)\) strictly positive on \(K\) \(\Rightarrow\) unique limit!

Bundle structure

GL(n)

P(n)

\Sigma_0

\Sigma_1

\dot P = P^{-1}\Sigma_0^{-1} - \Sigma_1^{-1}P + V

\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)

\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)

\dot\Sigma = S\Sigma + \Sigma S

K = P(n)

Horizontal gradient (heat) flow:

P(t)

\displaystyle F(P) = H_{\Sigma_1}(P\Sigma_0 P)

Lifted gradient flow on \(K\) for

Hessian of \(F(P)\) strictly positive on \(K\) \(\Rightarrow\) unique limit!

Bundle structure

GL(n)

P(n)

\Sigma_0

\Sigma_1

\dot P = P^{-1}\Sigma_0^{-1} - \Sigma_1^{-1}P + V

\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)

\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)

\dot\Sigma = S\Sigma + \Sigma S

K = P(n)

Horizontal gradient (heat) flow:

P(t)

\displaystyle F(P) = H_{\Sigma_1}(P\Sigma_0 P)

Lifted gradient flow on \(K\) for

Hessian of \(F(P)\) strictly positive on \(K\) \(\Rightarrow\) unique limit!

Wasserstein-Otto vs. Fisher-Rao

\mathrm{Dens}(M)

T_\mu\mathrm{Dens}(M)\simeq C^\infty_0(M)

Wasserstein

Fisher-Rao

\displaystyle\overline{\mathcal{G}}_\mu(\dot\mu,\dot\mu) = \int_{M} \frac{\dot\mu}{\mu}\frac{\dot\mu}{\mu}\mu

\displaystyle\overline{\mathcal{G}}_{\rho dx}(\dot\rho dx,\dot\rho dx) = \int_{M} |\nabla\theta|^2\rho

\displaystyle \dot\rho + \mathrm{div}(\rho \nabla\theta) = 0

Dependent on Riemannian structure of \(M\)

Independent of Riemannian structure of \(M \Rightarrow \mathrm{Diff}(M)\)-invariance

\displaystyle \rho(x) = \frac{1}{\sqrt{(2\pi)^n\mathrm{det}(\Sigma)}}\mathrm{exp}(-\frac{1}{2}x^\top \Sigma^{-1}x)

\displaystyle \rho(x) = \sqrt{\frac{\mathrm{det}(W)}{(2\pi)^n}}\mathrm{exp}(-\frac{1}{2}x^\top W x)

Bundle structure

GL(n)

P(n)\simeq O(n,W_0^{-1})\backslash GL(n)

W_0

W_1

\pi(A)=A^\top W_0 A

\bar{\mathcal G}_W(\dot W,\dot W) = \mathrm{tr}(W^{-1}\dot W W^{-1}\dot W)

Requirement for compatible metric on \(GL(n)\) :

Descending, meaning
left-invariant w.r.t. \(O(n,W_0^{-1}\) )
Right-invariant w.r.t. \(GL(n)\)

Bundle structure

GL(n)

P(n)\simeq O(n,W_0^{-1})\backslash GL(n)

W_0

W_1

\pi(A)=A^\top W_0 A

\mathcal G_I(\xi,\xi) = \mathrm{tr}( \ell(\xi)^\top\ell(\xi) + \sigma(\xi)^2)

\bar{\mathcal G}_W(\dot W,\dot W) = \mathrm{tr}(W^{-1}\dot W W^{-1}\dot W)

Requirement for compatible metric on \(GL(n)\) :

Descending, meaning
left-invariant w.r.t. \(O(n,W_0^{-1}\) )
Right-invariant w.r.t. \(GL(n)\)

Bundle structure

GL(n)

P(n)\simeq O(n,W_0^{-1})\backslash GL(n)

W_0

W_1

\pi(A)=A^\top W_0 A

\mathcal G_I(\xi,\xi) = \mathrm{tr}( \ell(\xi)^\top\ell(\xi) + \sigma(\xi)^2)

\bar{\mathcal G}_W(\dot W,\dot W) = \mathrm{tr}(W^{-1}\dot W W^{-1}\dot W)

Requirement for compatible metric on \(GL(n)\) :

Descending, meaning
left-invariant w.r.t. \(O(n,W_0^{-1}\) )
Right-invariant w.r.t. \(GL(n)\)

\(\mathrm{Hor}_I = \) upper triangular matrices

Bundle structure

GL(n)

P(n)\simeq O(n,W_0^{-1})\backslash GL(n)

W_0

W_1

\pi(A)=A^\top W_0 A

\mathcal G_I(\xi,\xi) = \mathrm{tr}( \ell(\xi)^\top\ell(\xi) + \sigma(\xi)^2)

\bar{\mathcal G}_W(\dot W,\dot W) = \mathrm{tr}(W^{-1}\dot W W^{-1}\dot W)

Requirement for compatible metric on \(GL(n)\) :

Descending, meaning
left-invariant w.r.t. \(O(n,W_0^{-1}\) )
Right-invariant w.r.t. \(GL(n)\)

\(\mathrm{Hor}_I = \) upper triangular matrices

Lie subalgebra \(\Rightarrow\) \(\mathrm{Hor}\) is integrable

Bundle structure

GL(n)

P(n)\simeq O(n,W_0^{-1})\backslash GL(n)

W_0

W_1

\pi(A)=A^\top W_0 A

\mathcal G_I(\xi,\xi) = \mathrm{tr}( \ell(\xi)^\top\ell(\xi) + \sigma(\xi)^2)

\bar{\mathcal G}_W(\dot W,\dot W) = \mathrm{tr}(W^{-1}\dot W W^{-1}\dot W)

Requirement for compatible metric on \(GL(n)\) :

Descending, meaning
left-invariant w.r.t. \(O(n,W_0^{-1}\) )
Right-invariant w.r.t. \(GL(n)\)

\(\mathrm{Hor}_I = \) upper triangular matrices

Lie subalgebra \(\Rightarrow\) \(\mathrm{Hor}\) is integrable

\(K=\) subgroup of upper triangular matrices with positive entries on diagonal

Bundle structure

GL(n)

P(n)\simeq O(n,W_0^{-1})\backslash GL(n)

W_0

W_1

\pi(A)=A^\top W_0 A

\mathcal G_I(\xi,\xi) = \mathrm{tr}( \ell(\xi)^\top\ell(\xi) + \sigma(\xi)^2)

\bar{\mathcal G}_W(\dot W,\dot W) = \mathrm{tr}(W^{-1}\dot W W^{-1}\dot W)

Requirement for compatible metric on \(GL(n)\) :

Descending, meaning
left-invariant w.r.t. \(O(n,W_0^{-1}\) )
Right-invariant w.r.t. \(GL(n)\)

\(\mathrm{Hor}_I = \) upper triangular matrices

Lie subalgebra \(\Rightarrow\) \(\mathrm{Hor}\) is integrable

\(K=\) subgroup of upper triangular matrices with positive entries on diagonal

Bundle structure

GL(n)

P(n)\simeq O(n,W_0^{-1})\backslash GL(n)

W_0

W_1

\pi(A)=A^\top W_0 A

\mathcal G_I(\xi,\xi) = \mathrm{tr}( \ell(\xi)^\top\ell(\xi) + \sigma(\xi)^2)

\bar{\mathcal G}_W(\dot W,\dot W) = \mathrm{tr}(W^{-1}\dot W W^{-1}\dot W)

Requirement for compatible metric on \(GL(n)\) :

Descending, meaning
left-invariant w.r.t. \(O(n,W_0^{-1}\) )
Right-invariant w.r.t. \(GL(n)\)

\(\mathrm{Hor}_I = \) upper triangular matrices

Lie subalgebra \(\Rightarrow\) \(\mathrm{Hor}\) is integrable

\(K=\) subgroup of upper triangular matrices with positive entries on diagonal

Bundle structure

GL(n)

P(n)\simeq O(n,W_0^{-1})\backslash GL(n)

W_0

W_1

\pi(A)=A^\top W_0 A

\mathcal G_I(\xi,\xi) = \mathrm{tr}( \ell(\xi)^\top\ell(\xi) + \sigma(\xi)^2)

\bar{\mathcal G}_W(\dot W,\dot W) = \mathrm{tr}(W^{-1}\dot W W^{-1}\dot W)

Requirement for compatible metric on \(GL(n)\) :

Descending, meaning
left-invariant w.r.t. \(O(n,W_0^{-1}\) )
Right-invariant w.r.t. \(GL(n)\)

\(\mathrm{Hor}_I = \) upper triangular matrices

Lie subalgebra \(\Rightarrow\) \(\mathrm{Hor}\) is integrable

\(K=\) subgroup of upper triangular matrices with positive entries on diagonal

A =

Bundle structure

GL(n)

P(n)\simeq O(n,W_0^{-1})\backslash GL(n)

W_0

W_1

\pi(A)=A^\top W_0 A

\mathcal G_I(\xi,\xi) = \mathrm{tr}( \ell(\xi)^\top\ell(\xi) + \sigma(\xi)^2)

\bar{\mathcal G}_W(\dot W,\dot W) = \mathrm{tr}(W^{-1}\dot W W^{-1}\dot W)

Requirement for compatible metric on \(GL(n)\) :

Descending, meaning
left-invariant w.r.t. \(O(n,W_0^{-1}\) )
Right-invariant w.r.t. \(GL(n)\)

\(\mathrm{Hor}_I = \) upper triangular matrices

Lie subalgebra \(\Rightarrow\) \(\mathrm{Hor}\) is integrable

\(K=\) subgroup of upper triangular matrices with positive entries on diagonal

W_1 = \pi(R) = R^\top W_0 R

Entropy gradient flow (Fisher-Rao)

GL(n)

P(n)

W_0

W_1

W(t)

\displaystyle H_{W_1}(\Sigma) = -\frac{1}{2}\mathrm{tr}(W_1 W^{-1}) + \frac{1}{2}\log\det(W_1 W^{-1})

Relative entropy

\dot W = W_1 - W

\dot W = \nabla_{\bar{\mathcal G}}H_{W_0}(W)

Further reduction

GL(n)

W_1

P(n)\simeq O(n)\backslash GL(n)

P(n)/O(n,)\simeq \mathrm{poly}^+_n

\pi(A) = A^\top A

\varpi(P) = \det(\lambda I - W)

Right action by \((Q,W)\mapsto Q^\top W Q \), clearly not free!

Spectral decomposition

P(n)

P(n)/O(n,)\simeq \mathrm{poly}^+_n

D(n)

Geodesic equation on \(D(n)\):

\(D(n)\) is totally geodesic submanifold!

\displaystyle \ddot\gamma_i - \frac{\dot\gamma_i^2}{\gamma_i} = 0

Notice: \(D(n)\) intersects \(W_1\)-orbit \(n!\) times
( \(D(n)\) an \(n!\)-covering of \(\mathrm{poly}_n^+\) )

\Lambda

Spectral decomposition

P(n)

W =

P(n)/O(n,)\simeq \mathrm{poly}^+_n

D(n)

Geodesic equation on \(D(n)\):

\(D(n)\) is totally geodesic submanifold!

\displaystyle \ddot\gamma_i - \frac{\dot\gamma_i^2}{\gamma_i} = 0

Notice: \(D(n)\) intersects \(W_1\)-orbit \(n!\) times
( \(D(n)\) an \(n!\)-covering of \(\mathrm{poly}_n^+\) )

Q\Lambda Q^\top

Brockett flow

P(n)

W_1

D(n)

\(H_N(W)\) relative entropy functional

Functional \(F(Q) = H_N(Q^\top W_1 Q)\) on \(O(n)\)

\mathrm{Orb}(W_1)

THANKS!

Reference:

K. Modin
Geometry of Matrix Decompositions Seen Through Optimal Transport and Information Geometry, 2017

Slides available at: slides.com/kmodin