Optimal transport, information, and matrix decompositions

Klas Modin

slides.com/kmodin/matrix-decompositions

Riemannian principal bundles

E/H\simeq B

\mathrm{Hor}

\hookleftarrow H

\downarrow

\pi

Invariant Riemannian metric on \(E\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

Riemannian principal bundles

E/H\simeq B

\mathrm{Hor}

\hookleftarrow H

\downarrow

\pi

Invariant Riemannian metric on \(E\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

Riemannian principal bundles

G/H\simeq B

\mathrm{Hor}

\hookleftarrow H

\downarrow

\pi

left co-sets \([g] = g\cdot H \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

Riemannian principal bundles

G/G_{b_0}\simeq B

\mathrm{Hor}

\hookleftarrow G_{b_0}

\downarrow

\pi(g) = g\cdot b_0

left co-sets \([g] = g\cdot G_{b_0} \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

b_0

G_{b_0}

Riemannian principal bundles

G_{b_0}\backslash G\simeq B

\mathrm{Hor}

G_{b_0}\hookrightarrow

\downarrow

\pi(g) = b_0\cdot g

right co-sets \([g] = G_{b_0}\cdot g \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

b_0

G_{b_0}

Riemannian principal bundles

G/G_{b_0}\simeq B

\text{horizontal flow}

\hookleftarrow G_{b_0}

\downarrow

\pi(g) = g\cdot b_0

left co-sets \([g] = g\cdot G_{b_0} \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

b_0

Riemannian principal bundles

G/G_{b_0}\simeq B

\text{vertical flow}

\hookleftarrow G_{b_0}

\downarrow

\pi(g) = g\cdot b_0

left co-sets \([g] = g\cdot G_{b_0} \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

b_0

Riemannian principal bundles

G/G_{b_0}\simeq B

g\in G

\hookleftarrow G_{b_0}

\downarrow

\pi(g) = g\cdot b_0

left co-sets \([g] = g\cdot G_{b_0} \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

b_0

b_1 = \pi(g) = g\cdot b_0

Riemannian principal bundles

G/G_{b_0}\simeq B

\gamma(1)\in G

\hookleftarrow G_{b_0}

\downarrow

\pi(g) = g\cdot b_0

left co-sets \([g] = g\cdot G_{b_0} \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

b_0

b_1 = \pi(g) = g\cdot b_0

\gamma'(t)\in \mathrm{Hor}

Riemannian principal bundles

G/G_{b_0}\simeq B

g = \gamma(1)(\gamma(1)^{-1}g)

\hookleftarrow G_{b_0}

\downarrow

\pi(g) = g\cdot b_0

left co-sets \([g] = g\cdot G_{b_0} \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

b_0

b_1 = \pi(g) = g\cdot b_0

\gamma'(t)\in \mathrm{Hor}

\overbrace{\phantom{klasklas}}^{\in G_{b_0}}

Riemannian principal bundles

G/G_{b_0}\simeq B

g = \gamma(1)h

\hookleftarrow G_{b_0}

\downarrow

\pi(g) = g\cdot b_0

left co-sets \([g] = g\cdot G_{b_0} \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

b_0

b_1 = \pi(g) = g\cdot b_0

\gamma'(t)\in \mathrm{Hor}

Riemannian principal bundles

G/G_{b_0}\simeq B

g = k h

\hookleftarrow G_{b_0}

\downarrow

\pi(g) = g\cdot b_0

left co-sets \([g] = g\cdot G_{b_0} \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

b_0

b_1 = \pi(g) = g\cdot b_0

polar cone

Riemannian principal bundles

G_{b_0}\backslash G\simeq B

g = h k

G_{b_0}\hookrightarrow

\downarrow

\pi(g) = b_0\cdot g

right co-sets \([g] = g\cdot G_{b_0} \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

b_0

b_1 = \pi(g) = b_0\cdot g

polar cone

Examples

Optimal mass transport
(infinite- and finite-dimensional)
\(\Rightarrow\) polar decomposition
Vertical flows: Euler equations, free rigid body, gradient OT flow
Horizontal flows: heat flow (entropy gradient flow)
Information geometry
\(\Rightarrow \; QR\), Cholesky, singular value, and spectral decompositions
Vertical flows: gradient flow for \(QR\), isospectral flows
Horizontal flows: Brocket flow (entropy gradient flow)

Optimal mass transport (OMT)

\mu_0

\mu_1

\varphi_*\mu_0

\displaystyle\min_{\varphi*\mu_0=\mu_1} \int_{M} d_M^2(\varphi(x),x ) \mu_0

Monge problem, \(L^2\) version

Optimal mass transport (OMT)

\mu_0

\mu_1

\varphi_*\mu_0

\displaystyle\min_{\varphi*\mu_0=\mu_1} \int_{\mathbb{R}^n} \lvert \varphi(x)-x \rvert^2 \mu_0

\mathbb{R}^n

Monge problem, \(L^2\) version

\underbrace{\phantom{klasklkllkklklas}}_{J(\varphi)}

Riemannian structure of OMT

\mathrm{Diff}(M)

\mathrm{Dens}(M)\simeq \mathrm{Diff}(M)/\mathrm{Diff}_{\mu_0}(M)

\mathrm{Id}

\mu_0

\mu_1

\pi(\varphi)=\varphi_*\mu_0

Riemannian metric

\displaystyle\mathcal{G}_\varphi(\dot\varphi,\dot\varphi) = \int_{M}\left\vert \dot\varphi \right\vert^2 \mu_0

Induced metric

\overline{\mathcal{G}}_\mu(\dot\mu,\dot\mu) = \int_M \lvert \nabla \theta\rvert^2 \mu

[Benamou & Brenier (2000), Otto (2001)]

Invariance: \(\eta\in\mathrm{Diff}_{\mu_0}(M)\)

\displaystyle\mathcal{G}_\varphi(\dot\varphi,\dot\varphi) = \mathcal{G}_{\varphi\circ\eta}(\dot\varphi\circ\eta,\dot\varphi\circ\eta)

\dot\rho + \operatorname{div}(\rho \nabla\theta) = 0, \; \rho = \mu/dx

Geodesics on \(\operatorname{Diff}(\mathbb{R}^n)\)

Geodesic equation:

\ddot\varphi = 0 \Rightarrow \varphi(t) = \mathrm{Id} + t\,v_0

\mathrm{Id}

\mu_0

\mu_1

\mathrm{Hor}_{\mathrm{Id}} = \nabla C^\infty(\mathbb{R}^n)

Easy to prove:

Polar cone \(K\) is isomorphic to strictly convex smooth functions via \(\phi \mapsto \nabla\phi\)

Hard to prove:

Polar cone \(K\) a section of principal bundle

Geodesics on \(\operatorname{Diff}(\mathbb{R}^n)\)

Geodesic equation:

\ddot\varphi = 0 \Rightarrow \varphi(t) = \mathrm{Id} + t\,v_0

\mathrm{Id}

\mu_0

\mu_1

\varphi = \nabla\phi\circ\eta, \; \eta\in \operatorname{Diff}_{\mu_0}(\mathbb{R}^n)

Easy to prove:

Polar cone \(K\) is isomorphic to strictly convex smooth functions via \(\phi \mapsto \nabla\phi\)

Hard to prove:

Polar cone \(K\) a section of principal bundle

Brenier's decomposition of transport maps

Geodesic distance on \(\operatorname{Diff}(\mathbb{R}^n)\)

Geodesic curve:

\varphi(t) = (1-t)\varphi_0 + t \varphi_1

\mathrm{Id}

\mu_0

\mu_1

\mathrm{dist}(\varphi_0,\varphi_1)^2 = \int_0^1 \mathcal{G}_{\varphi(t)}(\dot\varphi(t),\dot\varphi(t)) dt

= \int_{\mathbb{R}^n} \lvert \varphi_1(x)-\varphi_0(t) \rvert^2

= \int_0^1\int_{\mathbb{R}^n} \lvert \dot\varphi(t)\rvert^2\mu_0 dt

In particular:

J(\varphi) = \mathrm{dist}(\mathrm{Id},\varphi)^2

Monge-Ampere equation on \(\mathbb{R}^n\)

Geodesic curve:

\varphi(t) = \nabla( |x|^2/2 + t f )

\mathrm{Id}

\mu_0

\mu_1

(\nabla\phi)_*\mu_0 = \mu_1

In particular:

J(\varphi) = \mathrm{dist}(\mathrm{Id},\varphi)^2

\underbrace{\phantom{klaklklsklkasl}}_{\nabla\phi}

\displaystyle \Rightarrow \operatorname{det}(\nabla^2\phi) = \frac{\rho_0}{\rho_1\circ \nabla\phi}

Monge-Ampere equation on \(\mathbb{R}^n\)

Geodesic curve:

\varphi(t) = \nabla( |x|^2/2 + t f )

\mathrm{Id}

\mu_0

\mu_1

(\nabla\phi)_*\mu_0 = \mu_1

In particular:

J(\varphi) = \mathrm{dist}(\mathrm{Id},\varphi)^2

\underbrace{\phantom{klaklklsklkasl}}_{\nabla\phi}

\displaystyle \Rightarrow \operatorname{det}(\nabla^2\phi) = \frac{\rho_0}{\rho_1\circ \nabla\phi}

Linear optimal mass transport

Trivial observation: \(\varphi_0(x) = A_0 x\), \(\varphi_1(x) = A_1 x\) linear diffeomorphisms \(\Rightarrow\) geodesic consists of linear diffeomorphisms

Consequence: \(GL(n)\) is totally geodesic subgroup of \(\operatorname{Diff}(\mathbb{R}^n)\)

Corresponding subspace of densities (statistical submanifold): multivariate Gaussians with zero mean

\displaystyle \rho(x) = \frac{1}{\sqrt{(2\pi)^n\mathrm{det}(\Sigma)}}\mathrm{exp}(-\frac{1}{2}x^\top \Sigma^{-1}x)

\Sigma \in P(n)

Bundle structure

GL(n)

P(n)\simeq GL(n)/O(n,\Sigma_0)

\Sigma_0

\Sigma_1

\pi(A)=A\Sigma_0 A^\top

\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)

\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)

\dot\Sigma = S\Sigma + \Sigma S

Bundle structure

GL(n)

P(n)\simeq GL(n)/O(n,\Sigma_0)

\Sigma_0

\Sigma_1

\pi(A)=A\Sigma_0 A^\top

\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)

\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)

\dot\Sigma = S\Sigma + \Sigma S

K = P(n)

Bundle structure

GL(n)

P(n)\simeq GL(n)/O(n,\Sigma_0)

\Sigma_0

\Sigma_1

\pi(P)=P\Sigma_0 P = \Sigma_1

\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)

\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)

\dot\Sigma = S\Sigma + \Sigma S

K = P(n)

Monge-Ampere equation:

Bundle structure

GL(n)

P(n)\simeq GL(n)/O(n,\Sigma_0)

\Sigma_0

\Sigma_1

A = PQ

\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)

\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)

\dot\Sigma = S\Sigma + \Sigma S

K = P(n)

Factorization theorem:

Bundle structure

GL(n)

P(n)\simeq GL(n)/O(n,\Sigma_0)

\Sigma_0

\Sigma_1

\dot B = -\mathrm{Pr}\nabla_{\mathcal G}J(B), \; B(0) = A

\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)

\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)

\dot\Sigma = S\Sigma + \Sigma S

K = P(n)

Vertical gradient flow:

Bundle structure

GL(n)

P(n)\simeq GL(n)/O(n,\Sigma_0)

\Sigma_0

\Sigma_1

\dot B = \Omega B, \; B(0) = A

\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)

\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)

\dot\Sigma = S\Sigma + \Sigma S

K = P(n)

Vertical gradient flow:

\Sigma_1 \Omega + \Omega\Sigma_1 = 2\Sigma_1 (B^{-1}-B^{-\top})

Bundle structure

GL(n)

P(n)

\Sigma_0

\Sigma_1

\dot B = \Omega B, \; B(0) = A

\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)

\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)

\dot\Sigma = S\Sigma + \Sigma S

K = P(n)

Vertical gradient flow:

\Sigma_1 \Omega + \Omega\Sigma_1 = 2\Sigma_1 (B^{-1}-B^{-\top})

Bundle structure

GL(n)

P(n)

\Sigma_0

\Sigma_1

\dot \Sigma = \mathrm{Pr}\nabla_{\bar{\mathcal G}}H_{\Sigma_1}(\Sigma)

\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)

\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)

\dot\Sigma = S\Sigma + \Sigma S

K = P(n)

Horizontal gradient (heat) flow:

\Sigma(t)

\displaystyle H_{\Sigma_1}(\Sigma) = -\frac{1}{2}\mathrm{tr}(\Sigma_1^{-1}\Sigma) + \frac{1}{2}\log\det(\Sigma_1^{-1}\Sigma)

Relative entropy

(Kullback-Leibler)

Bundle structure

GL(n)

P(n)

\Sigma_0

\Sigma_1

\dot \Sigma = 2I - \Sigma_1^{-1}\Sigma - \Sigma\Sigma_1^{-1}

\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)

\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)

\dot\Sigma = S\Sigma + \Sigma S

K = P(n)

Horizontal gradient (heat) flow:

\Sigma(t)

\displaystyle H_{\Sigma_1}(\Sigma) = -\frac{1}{2}\mathrm{tr}(\Sigma_1^{-1}\Sigma) + \frac{1}{2}\log\det(\Sigma_1^{-1}\Sigma)

Relative entropy

(Kullback-Leibler)

Bundle structure

GL(n)

P(n)

\Sigma_0

\Sigma_1

\dot P = P^{-1}\Sigma_0^{-1} - \Sigma_1^{-1}P + V

\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)

\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)

\dot\Sigma = S\Sigma + \Sigma S

K = P(n)

Horizontal gradient (heat) flow:

P(t)

\displaystyle F(P) = H_{\Sigma_1}(P\Sigma_0 P)

Lifted gradient flow on \(K\) for

Bundle structure

GL(n)

P(n)

\Sigma_0

\Sigma_1

\dot P = P^{-1}\Sigma_0^{-1} - \Sigma_1^{-1}P + V

\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)

\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)

\dot\Sigma = S\Sigma + \Sigma S

K = P(n)

Horizontal gradient (heat) flow:

P(t)

\displaystyle F(P) = H_{\Sigma_1}(P\Sigma_0 P)

Lifted gradient flow on \(K\) for

Hessian of \(F(P)\) strictly positive on \(K\) \(\Rightarrow\) unique limit!

Bundle structure

GL(n)

P(n)

\Sigma_0

\Sigma_1

\dot P = P^{-1}\Sigma_0^{-1} - \Sigma_1^{-1}P + V

\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)

\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)

\dot\Sigma = S\Sigma + \Sigma S

K = P(n)

Horizontal gradient (heat) flow:

P(t)

\displaystyle F(P) = H_{\Sigma_1}(P\Sigma_0 P)

Lifted gradient flow on \(K\) for

Hessian of \(F(P)\) strictly positive on \(K\) \(\Rightarrow\) unique limit!

Bundle structure

GL(n)

P(n)

\Sigma_0

\Sigma_1

\dot P = P^{-1}\Sigma_0^{-1} - \Sigma_1^{-1}P + V

\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)

\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)

\dot\Sigma = S\Sigma + \Sigma S

K = P(n)

Horizontal gradient (heat) flow:

P(t)

\displaystyle F(P) = H_{\Sigma_1}(P\Sigma_0 P)

Lifted gradient flow on \(K\) for

Hessian of \(F(P)\) strictly positive on \(K\) \(\Rightarrow\) unique limit!

Bundle structure

GL(n)

P(n)

\Sigma_0

\Sigma_1

\dot P = P^{-1}\Sigma_0^{-1} - \Sigma_1^{-1}P + V

\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)

\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)

\dot\Sigma = S\Sigma + \Sigma S

K = P(n)

Horizontal gradient (heat) flow:

P(t)

\displaystyle F(P) = H_{\Sigma_1}(P\Sigma_0 P)

Lifted gradient flow on \(K\) for

Hessian of \(F(P)\) strictly positive on \(K\) \(\Rightarrow\) unique limit!

Wasserstein-Otto vs. Fisher-Rao

\mathrm{Dens}(M)

T_\mu\mathrm{Dens}(M)\simeq C^\infty_0(M)

Wasserstein

Fisher-Rao

\displaystyle\overline{\mathcal{G}}_\mu(\dot\mu,\dot\mu) = \int_{M} \frac{\dot\mu}{\mu}\frac{\dot\mu}{\mu}\mu

\displaystyle\overline{\mathcal{G}}_{\rho dx}(\dot\rho dx,\dot\rho dx) = \int_{M} |\nabla\theta|^2\rho

\displaystyle \dot\rho + \mathrm{div}(\rho \nabla\theta) = 0

Dependent on Riemannian structure of \(M\)

Independent of Riemannian structure of \(M \Rightarrow \mathrm{Diff}(M)\)-invariance

\displaystyle \rho(x) = \frac{1}{\sqrt{(2\pi)^n\mathrm{det}(\Sigma)}}\mathrm{exp}(-\frac{1}{2}x^\top \Sigma^{-1}x)

\displaystyle \rho(x) = \sqrt{\frac{\mathrm{det}(W)}{(2\pi)^n}}\mathrm{exp}(-\frac{1}{2}x^\top W x)

Bundle structure

GL(n)

P(n)\simeq O(n,W_0^{-1})\backslash GL(n)

W_0

W_1

\pi(A)=A^\top W_0 A

\bar{\mathcal G}_W(\dot W,\dot W) = \mathrm{tr}(W^{-1}\dot W W^{-1}\dot W)

Requirement for compatible metric on \(GL(n)\) :

Descending, meaning
left-invariant w.r.t. \(O(n,W_0^{-1}\) )
Right-invariant w.r.t. \(GL(n)\)

Bundle structure

GL(n)

P(n)\simeq O(n,W_0^{-1})\backslash GL(n)

W_0

W_1

\pi(A)=A^\top W_0 A

\mathcal G_I(\xi,\xi) = \mathrm{tr}( \ell(\xi)^\top\ell(\xi) + \sigma(\xi)^2)

\bar{\mathcal G}_W(\dot W,\dot W) = \mathrm{tr}(W^{-1}\dot W W^{-1}\dot W)

Requirement for compatible metric on \(GL(n)\) :

Descending, meaning
left-invariant w.r.t. \(O(n,W_0^{-1}\) )
Right-invariant w.r.t. \(GL(n)\)

Bundle structure

GL(n)

P(n)\simeq O(n,W_0^{-1})\backslash GL(n)

W_0

W_1

\pi(A)=A^\top W_0 A

\mathcal G_I(\xi,\xi) = \mathrm{tr}( \ell(\xi)^\top\ell(\xi) + \sigma(\xi)^2)

\bar{\mathcal G}_W(\dot W,\dot W) = \mathrm{tr}(W^{-1}\dot W W^{-1}\dot W)

Requirement for compatible metric on \(GL(n)\) :

Descending, meaning
left-invariant w.r.t. \(O(n,W_0^{-1}\) )
Right-invariant w.r.t. \(GL(n)\)

\(\mathrm{Hor}_I = \) upper triangular matrices

Bundle structure

GL(n)

P(n)\simeq O(n,W_0^{-1})\backslash GL(n)

W_0

W_1

\pi(A)=A^\top W_0 A

\mathcal G_I(\xi,\xi) = \mathrm{tr}( \ell(\xi)^\top\ell(\xi) + \sigma(\xi)^2)

\bar{\mathcal G}_W(\dot W,\dot W) = \mathrm{tr}(W^{-1}\dot W W^{-1}\dot W)

Requirement for compatible metric on \(GL(n)\) :

Descending, meaning
left-invariant w.r.t. \(O(n,W_0^{-1}\) )
Right-invariant w.r.t. \(GL(n)\)

\(\mathrm{Hor}_I = \) upper triangular matrices

Lie subalgebra \(\Rightarrow\) \(\mathrm{Hor}\) is integrable

Bundle structure

GL(n)

P(n)\simeq O(n,W_0^{-1})\backslash GL(n)

W_0

W_1

\pi(A)=A^\top W_0 A

\mathcal G_I(\xi,\xi) = \mathrm{tr}( \ell(\xi)^\top\ell(\xi) + \sigma(\xi)^2)

\bar{\mathcal G}_W(\dot W,\dot W) = \mathrm{tr}(W^{-1}\dot W W^{-1}\dot W)

Requirement for compatible metric on \(GL(n)\) :

Descending, meaning
left-invariant w.r.t. \(O(n,W_0^{-1}\) )
Right-invariant w.r.t. \(GL(n)\)

\(\mathrm{Hor}_I = \) upper triangular matrices

Lie subalgebra \(\Rightarrow\) \(\mathrm{Hor}\) is integrable

\(K=\) subgroup of upper triangular matrices with positive entries on diagonal

Bundle structure

GL(n)

P(n)\simeq O(n,W_0^{-1})\backslash GL(n)

W_0

W_1

\pi(A)=A^\top W_0 A

\mathcal G_I(\xi,\xi) = \mathrm{tr}( \ell(\xi)^\top\ell(\xi) + \sigma(\xi)^2)

\bar{\mathcal G}_W(\dot W,\dot W) = \mathrm{tr}(W^{-1}\dot W W^{-1}\dot W)

Requirement for compatible metric on \(GL(n)\) :

Descending, meaning
left-invariant w.r.t. \(O(n,W_0^{-1}\) )
Right-invariant w.r.t. \(GL(n)\)

\(\mathrm{Hor}_I = \) upper triangular matrices

Lie subalgebra \(\Rightarrow\) \(\mathrm{Hor}\) is integrable

\(K=\) subgroup of upper triangular matrices with positive entries on diagonal

Bundle structure

GL(n)

P(n)\simeq O(n,W_0^{-1})\backslash GL(n)

W_0

W_1

\pi(A)=A^\top W_0 A

\mathcal G_I(\xi,\xi) = \mathrm{tr}( \ell(\xi)^\top\ell(\xi) + \sigma(\xi)^2)

\bar{\mathcal G}_W(\dot W,\dot W) = \mathrm{tr}(W^{-1}\dot W W^{-1}\dot W)

Requirement for compatible metric on \(GL(n)\) :

Descending, meaning
left-invariant w.r.t. \(O(n,W_0^{-1}\) )
Right-invariant w.r.t. \(GL(n)\)

\(\mathrm{Hor}_I = \) upper triangular matrices

Lie subalgebra \(\Rightarrow\) \(\mathrm{Hor}\) is integrable

\(K=\) subgroup of upper triangular matrices with positive entries on diagonal

Bundle structure

GL(n)

P(n)\simeq O(n,W_0^{-1})\backslash GL(n)

W_0

W_1

\pi(A)=A^\top W_0 A

\mathcal G_I(\xi,\xi) = \mathrm{tr}( \ell(\xi)^\top\ell(\xi) + \sigma(\xi)^2)

\bar{\mathcal G}_W(\dot W,\dot W) = \mathrm{tr}(W^{-1}\dot W W^{-1}\dot W)

Requirement for compatible metric on \(GL(n)\) :

Descending, meaning
left-invariant w.r.t. \(O(n,W_0^{-1}\) )
Right-invariant w.r.t. \(GL(n)\)

\(\mathrm{Hor}_I = \) upper triangular matrices

Lie subalgebra \(\Rightarrow\) \(\mathrm{Hor}\) is integrable

\(K=\) subgroup of upper triangular matrices with positive entries on diagonal

A =

Bundle structure

GL(n)

P(n)\simeq O(n,W_0^{-1})\backslash GL(n)

W_0

W_1

\pi(A)=A^\top W_0 A

\mathcal G_I(\xi,\xi) = \mathrm{tr}( \ell(\xi)^\top\ell(\xi) + \sigma(\xi)^2)

\bar{\mathcal G}_W(\dot W,\dot W) = \mathrm{tr}(W^{-1}\dot W W^{-1}\dot W)

Requirement for compatible metric on \(GL(n)\) :

Descending, meaning
left-invariant w.r.t. \(O(n,W_0^{-1}\) )
Right-invariant w.r.t. \(GL(n)\)

\(\mathrm{Hor}_I = \) upper triangular matrices

Lie subalgebra \(\Rightarrow\) \(\mathrm{Hor}\) is integrable

\(K=\) subgroup of upper triangular matrices with positive entries on diagonal

W_1 = \pi(R) = R^\top W_0 R

Entropy gradient flow (Fisher-Rao)

GL(n)

P(n)

W_0

W_1

W(t)

\displaystyle H_{W_1}(\Sigma) = -\frac{1}{2}\mathrm{tr}(W_1 W^{-1}) + \frac{1}{2}\log\det(W_1 W^{-1})

Relative entropy

\dot W = W_1 - W

\dot W = \nabla_{\bar{\mathcal G}}H_{W_0}(W)

Further reduction

GL(n)

W_1

P(n)\simeq O(n)\backslash GL(n)

P(n)/O(n,)\simeq \mathrm{poly}^+_n

\pi(A) = A^\top A

\varpi(P) = \det(\lambda I - W)

Right action by \((Q,W)\mapsto Q^\top W Q \), clearly not free!

Spectral decomposition

P(n)

P(n)/O(n,)\simeq \mathrm{poly}^+_n

D(n)

Geodesic equation on \(D(n)\):

\(D(n)\) is totally geodesic submanifold!

\displaystyle \ddot\gamma_i - \frac{\dot\gamma_i^2}{\gamma_i} = 0

Notice: \(D(n)\) intersects \(W_1\)-orbit \(n!\) times
( \(D(n)\) an \(n!\)-covering of \(\mathrm{poly}_n^+\) )

\Lambda

Spectral decomposition

P(n)

W =

P(n)/O(n,)\simeq \mathrm{poly}^+_n

D(n)

Geodesic equation on \(D(n)\):

\(D(n)\) is totally geodesic submanifold!

\displaystyle \ddot\gamma_i - \frac{\dot\gamma_i^2}{\gamma_i} = 0

Notice: \(D(n)\) intersects \(W_1\)-orbit \(n!\) times
( \(D(n)\) an \(n!\)-covering of \(\mathrm{poly}_n^+\) )

Q\Lambda Q^\top

Brockett flow

P(n)

W_1

D(n)

\(H_N(W)\) relative entropy functional

Functional \(F(Q) = H_N(Q^\top W_1 Q)\) on \(O(n)\)

\mathrm{Orb}(W_1)

THANKS!

Reference:

K. Modin
Geometry of Matrix Decompositions Seen Through Optimal Transport and Information Geometry, 2017

Slides available at: slides.com/kmodin

Geometry of matrix decompositions seen through optimal transport and information geometry

By Klas Modin

Geometry of matrix decompositions seen through optimal transport and information geometry

Online-presentation given 2020-12 in the Hamiltonian Seminar Series, University of Toronto.

2,230

Klas Modin PRO

Mathematician at Chalmers University of Technology and the University of Gothenburg

klasmodin.github.io

Optimal transport, information, and matrix decompositions

Klas Modin

Riemannian principal bundles

Riemannian principal bundles

Riemannian principal bundles

Riemannian principal bundles

Riemannian principal bundles

Riemannian principal bundles

Riemannian principal bundles

Riemannian principal bundles

Riemannian principal bundles

Riemannian principal bundles

Riemannian principal bundles

Riemannian principal bundles

Riemannian principal bundles

Examples

Optimal mass transport (OMT)

Optimal mass transport (OMT)

Riemannian structure of OMT

Geodesics on \(\operatorname{Diff}(\mathbb{R}^n)\)

Geodesics on \(\operatorname{Diff}(\mathbb{R}^n)\)

Geodesic distance on \(\operatorname{Diff}(\mathbb{R}^n)\)

Monge-Ampere equation on \(\mathbb{R}^n\)

Monge-Ampere equation on \(\mathbb{R}^n\)

Linear optimal mass transport

Bundle structure

Bundle structure

Bundle structure

Bundle structure

Bundle structure

Bundle structure

Bundle structure

Bundle structure

Bundle structure

Bundle structure

Bundle structure

Bundle structure

Bundle structure

Bundle structure

Wasserstein-Otto vs. Fisher-Rao

Bundle structure

Bundle structure

Bundle structure

Bundle structure

Bundle structure

Bundle structure

Bundle structure

Bundle structure

Bundle structure

Entropy gradient flow (Fisher-Rao)

Further reduction

Spectral decomposition

Spectral decomposition

Brockett flow

THANKS!

Geometry of matrix decompositions seen through optimal transport and information geometry

More from Klas Modin