Optimal transport, information, and matrix decompositions

Klas Modin

Riemannian principal bundles

E
E/H\simeq B
\mathrm{Hor}
E
\hookleftarrow H
\downarrow
B
\pi

Invariant Riemannian metric on \(E\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

Riemannian principal bundles

E
E/H\simeq B
\mathrm{Hor}
E
\hookleftarrow H
\downarrow
B
\pi

Invariant Riemannian metric on \(E\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

Riemannian principal bundles

G
G/H\simeq B
e
\mathrm{Hor}
G
\hookleftarrow H
\downarrow
B
\pi

left co-sets \([g] = g\cdot H  \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

Riemannian principal bundles

G
G/G_{b_0}\simeq B
e
\mathrm{Hor}
G
\hookleftarrow G_{b_0}
\downarrow
B
\pi(g) = g\cdot b_0

left co-sets \([g] = g\cdot G_{b_0}  \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

b_0
G_{b_0}

Riemannian principal bundles

G
G_{b_0}\backslash G\simeq B
e
\mathrm{Hor}
G
G_{b_0}\hookrightarrow
\downarrow
B
\pi(g) = b_0\cdot g

right co-sets \([g] =  G_{b_0}\cdot g  \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

b_0
G_{b_0}

Riemannian principal bundles

G
G/G_{b_0}\simeq B
e
\text{horizontal flow}
G
\hookleftarrow G_{b_0}
\downarrow
B
\pi(g) = g\cdot b_0

left co-sets \([g] = g\cdot G_{b_0}  \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

b_0

Riemannian principal bundles

G
G/G_{b_0}\simeq B
e
\text{vertical flow}
G
\hookleftarrow G_{b_0}
\downarrow
B
\pi(g) = g\cdot b_0

left co-sets \([g] = g\cdot G_{b_0}  \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

b_0

Riemannian principal bundles

G
G/G_{b_0}\simeq B
e
g\in G
G
\hookleftarrow G_{b_0}
\downarrow
B
\pi(g) = g\cdot b_0

left co-sets \([g] = g\cdot G_{b_0}  \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

b_0
b_1 = \pi(g) = g\cdot b_0

Riemannian principal bundles

G
G/G_{b_0}\simeq B
e
\gamma(1)\in G
G
\hookleftarrow G_{b_0}
\downarrow
B
\pi(g) = g\cdot b_0

left co-sets \([g] = g\cdot G_{b_0}  \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

b_0
b_1 = \pi(g) = g\cdot b_0
\gamma'(t)\in \mathrm{Hor}

Riemannian principal bundles

G
G/G_{b_0}\simeq B
e
g = \gamma(1)(\gamma(1)^{-1}g)
G
\hookleftarrow G_{b_0}
\downarrow
B
\pi(g) = g\cdot b_0

left co-sets \([g] = g\cdot G_{b_0}  \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

b_0
b_1 = \pi(g) = g\cdot b_0
\gamma'(t)\in \mathrm{Hor}
\overbrace{\phantom{klasklas}}^{\in G_{b_0}}

Riemannian principal bundles

G
G/G_{b_0}\simeq B
e
g = \gamma(1)h
G
\hookleftarrow G_{b_0}
\downarrow
B
\pi(g) = g\cdot b_0

left co-sets \([g] = g\cdot G_{b_0}  \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

b_0
b_1 = \pi(g) = g\cdot b_0
\gamma'(t)\in \mathrm{Hor}

Riemannian principal bundles

G
G/G_{b_0}\simeq B
e
g = k h
G
\hookleftarrow G_{b_0}
\downarrow
B
\pi(g) = g\cdot b_0

left co-sets \([g] = g\cdot G_{b_0}  \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

b_0
b_1 = \pi(g) = g\cdot b_0
K

polar cone

Riemannian principal bundles

G
G_{b_0}\backslash G\simeq B
e
g = h k
G
G_{b_0}\hookrightarrow
\downarrow
B
\pi(g) = b_0\cdot g

right co-sets \([g] = g\cdot G_{b_0}  \)

Semi-invariant Riemannian metric on \(G\)

\(\Rightarrow\) \(\pi\) Riemannian submersion

b_0
b_1 = \pi(g) = b_0\cdot g
K

polar cone

Examples

  • Optimal mass transport
    (infinite- and finite-dimensional)
    \(\Rightarrow\) polar decomposition
    Vertical flows: Euler equations, free rigid body, gradient OT flow
    Horizontal flows: heat flow (entropy gradient flow)
     
  • Information geometry
    \(\Rightarrow \; QR\), Cholesky, singular value, and spectral decompositions
    Vertical flows: gradient flow for \(QR\), isospectral flows
    Horizontal flows: Brocket flow (entropy gradient flow)

Optimal mass transport (OMT)

\mu_0
\mu_1
\varphi_*\mu_0
\displaystyle\min_{\varphi*\mu_0=\mu_1} \int_{M} d_M^2(\varphi(x),x ) \mu_0
M

Monge problem, \(L^2\) version

Optimal mass transport (OMT)

\mu_0
\mu_1
\varphi_*\mu_0
\displaystyle\min_{\varphi*\mu_0=\mu_1} \int_{\mathbb{R}^n} \lvert \varphi(x)-x \rvert^2 \mu_0
\mathbb{R}^n

Monge problem, \(L^2\) version

\underbrace{\phantom{klasklkllkklklas}}_{J(\varphi)}

Riemannian structure of OMT

\mathrm{Diff}(M)
\mathrm{Dens}(M)\simeq \mathrm{Diff}(M)/\mathrm{Diff}_{\mu_0}(M)
\mathrm{Id}
\mu_0
\mu_1
\pi(\varphi)=\varphi_*\mu_0

Riemannian metric

\displaystyle\mathcal{G}_\varphi(\dot\varphi,\dot\varphi) = \int_{M}\left\vert \dot\varphi \right\vert^2 \mu_0

Induced metric

\overline{\mathcal{G}}_\mu(\dot\mu,\dot\mu) = \int_M \lvert \nabla \theta\rvert^2 \mu

[Benamou & Brenier (2000), Otto (2001)]

Invariance: \(\eta\in\mathrm{Diff}_{\mu_0}(M)\)

\displaystyle\mathcal{G}_\varphi(\dot\varphi,\dot\varphi) = \mathcal{G}_{\varphi\circ\eta}(\dot\varphi\circ\eta,\dot\varphi\circ\eta)
\dot\rho + \operatorname{div}(\rho \nabla\theta) = 0, \; \rho = \mu/dx

Geodesics on \(\operatorname{Diff}(\mathbb{R}^n)\)

Geodesic equation:

\ddot\varphi = 0 \Rightarrow \varphi(t) = \mathrm{Id} + t\,v_0
\mathrm{Id}
\mu_0
K
\mu_1
\mathrm{Hor}_{\mathrm{Id}} = \nabla C^\infty(\mathbb{R}^n)

Easy to prove:

Polar cone \(K\) is isomorphic to strictly convex smooth functions via \(\phi \mapsto \nabla\phi\)

Hard to prove:

Polar cone \(K\) a section of principal bundle

Geodesics on \(\operatorname{Diff}(\mathbb{R}^n)\)

Geodesic equation:

\ddot\varphi = 0 \Rightarrow \varphi(t) = \mathrm{Id} + t\,v_0
\mathrm{Id}
\mu_0
K
\mu_1
\varphi = \nabla\phi\circ\eta, \; \eta\in \operatorname{Diff}_{\mu_0}(\mathbb{R}^n)

Easy to prove:

Polar cone \(K\) is isomorphic to strictly convex smooth functions via \(\phi \mapsto \nabla\phi\)

Hard to prove:

Polar cone \(K\) a section of principal bundle

Brenier's decomposition of transport maps

Geodesic distance on \(\operatorname{Diff}(\mathbb{R}^n)\)

Geodesic curve:

\varphi(t) = (1-t)\varphi_0 + t \varphi_1
\mathrm{Id}
\mu_0
K
\mu_1
\mathrm{dist}(\varphi_0,\varphi_1)^2 = \int_0^1 \mathcal{G}_{\varphi(t)}(\dot\varphi(t),\dot\varphi(t)) dt
= \int_{\mathbb{R}^n} \lvert \varphi_1(x)-\varphi_0(t) \rvert^2
= \int_0^1\int_{\mathbb{R}^n} \lvert \dot\varphi(t)\rvert^2\mu_0 dt

In particular:

J(\varphi) = \mathrm{dist}(\mathrm{Id},\varphi)^2

Monge-Ampere equation on \(\mathbb{R}^n\)

Geodesic curve:

\varphi(t) = \nabla( |x|^2/2 + t f )
\mathrm{Id}
\mu_0
K
\mu_1
(\nabla\phi)_*\mu_0 = \mu_1

In particular:

J(\varphi) = \mathrm{dist}(\mathrm{Id},\varphi)^2
\underbrace{\phantom{klaklklsklkasl}}_{\nabla\phi}
\displaystyle \Rightarrow \operatorname{det}(\nabla^2\phi) = \frac{\rho_0}{\rho_1\circ \nabla\phi}

Monge-Ampere equation on \(\mathbb{R}^n\)

Geodesic curve:

\varphi(t) = \nabla( |x|^2/2 + t f )
\mathrm{Id}
\mu_0
K
\mu_1
(\nabla\phi)_*\mu_0 = \mu_1

In particular:

J(\varphi) = \mathrm{dist}(\mathrm{Id},\varphi)^2
\underbrace{\phantom{klaklklsklkasl}}_{\nabla\phi}
\displaystyle \Rightarrow \operatorname{det}(\nabla^2\phi) = \frac{\rho_0}{\rho_1\circ \nabla\phi}

Linear optimal mass transport

Trivial observation:   \(\varphi_0(x) = A_0 x\), \(\varphi_1(x) = A_1 x\)   linear diffeomorphisms \(\Rightarrow\) geodesic consists of linear diffeomorphisms

Consequence: \(GL(n)\) is totally geodesic subgroup of \(\operatorname{Diff}(\mathbb{R}^n)\)

Corresponding subspace of densities (statistical submanifold): multivariate Gaussians with zero mean

\displaystyle \rho(x) = \frac{1}{\sqrt{(2\pi)^n\mathrm{det}(\Sigma)}}\mathrm{exp}(-\frac{1}{2}x^\top \Sigma^{-1}x)
\Sigma \in P(n)

Bundle structure

GL(n)
P(n)\simeq GL(n)/O(n,\Sigma_0)
I
\Sigma_0
\Sigma_1
\pi(A)=A\Sigma_0 A^\top
\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)
\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)
\dot\Sigma = S\Sigma + \Sigma S
K

Bundle structure

GL(n)
P(n)\simeq GL(n)/O(n,\Sigma_0)
I
\Sigma_0
\Sigma_1
\pi(A)=A\Sigma_0 A^\top
\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)
\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)
\dot\Sigma = S\Sigma + \Sigma S
K = P(n)

Bundle structure

GL(n)
P(n)\simeq GL(n)/O(n,\Sigma_0)
I
\Sigma_0
\Sigma_1
\pi(P)=P\Sigma_0 P = \Sigma_1
\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)
\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)
\dot\Sigma = S\Sigma + \Sigma S
K = P(n)

Monge-Ampere equation:

P

Bundle structure

GL(n)
P(n)\simeq GL(n)/O(n,\Sigma_0)
I
\Sigma_0
\Sigma_1
A = PQ
\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)
\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)
\dot\Sigma = S\Sigma + \Sigma S
K = P(n)

Factorization theorem:

A
P

Bundle structure

GL(n)
P(n)\simeq GL(n)/O(n,\Sigma_0)
I
\Sigma_0
\Sigma_1
\dot B = -\mathrm{Pr}\nabla_{\mathcal G}J(B), \; B(0) = A
\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)
\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)
\dot\Sigma = S\Sigma + \Sigma S
K = P(n)

Vertical gradient flow:

A

Bundle structure

GL(n)
P(n)\simeq GL(n)/O(n,\Sigma_0)
I
\Sigma_0
\Sigma_1
\dot B = \Omega B, \; B(0) = A
\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)
\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)
\dot\Sigma = S\Sigma + \Sigma S
K = P(n)

Vertical gradient flow:

A
\Sigma_1 \Omega + \Omega\Sigma_1 = 2\Sigma_1 (B^{-1}-B^{-\top})

Bundle structure

GL(n)
P(n)
I
\Sigma_0
\Sigma_1
\dot B = \Omega B, \; B(0) = A
\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)
\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)
\dot\Sigma = S\Sigma + \Sigma S
K = P(n)

Vertical gradient flow:

A
\Sigma_1 \Omega + \Omega\Sigma_1 = 2\Sigma_1 (B^{-1}-B^{-\top})

Bundle structure

GL(n)
P(n)
I
\Sigma_0
\Sigma_1
\dot \Sigma = \mathrm{Pr}\nabla_{\bar{\mathcal G}}H_{\Sigma_1}(\Sigma)
\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)
\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)
\dot\Sigma = S\Sigma + \Sigma S
K = P(n)

Horizontal gradient (heat) flow:

\Sigma(t)
\displaystyle H_{\Sigma_1}(\Sigma) = -\frac{1}{2}\mathrm{tr}(\Sigma_1^{-1}\Sigma) + \frac{1}{2}\log\det(\Sigma_1^{-1}\Sigma)

Relative entropy

(Kullback-Leibler)

Bundle structure

GL(n)
P(n)
I
\Sigma_0
\Sigma_1
\dot \Sigma = 2I - \Sigma_1^{-1}\Sigma - \Sigma\Sigma_1^{-1}
\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)
\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)
\dot\Sigma = S\Sigma + \Sigma S
K = P(n)

Horizontal gradient (heat) flow:

\Sigma(t)
\displaystyle H_{\Sigma_1}(\Sigma) = -\frac{1}{2}\mathrm{tr}(\Sigma_1^{-1}\Sigma) + \frac{1}{2}\log\det(\Sigma_1^{-1}\Sigma)

Relative entropy

(Kullback-Leibler)

Bundle structure

GL(n)
P(n)
I
\Sigma_0
\Sigma_1
\dot P = P^{-1}\Sigma_0^{-1} - \Sigma_1^{-1}P + V
\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)
\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)
\dot\Sigma = S\Sigma + \Sigma S
K = P(n)

Horizontal gradient (heat) flow:

P(t)
\displaystyle F(P) = H_{\Sigma_1}(P\Sigma_0 P)

Lifted gradient flow on \(K\) for

Bundle structure

GL(n)
P(n)
I
\Sigma_0
\Sigma_1
\dot P = P^{-1}\Sigma_0^{-1} - \Sigma_1^{-1}P + V
\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)
\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)
\dot\Sigma = S\Sigma + \Sigma S
K = P(n)

Horizontal gradient (heat) flow:

P(t)
\displaystyle F(P) = H_{\Sigma_1}(P\Sigma_0 P)

Lifted gradient flow on \(K\) for

Hessian of \(F(P)\) strictly positive on \(K\) \(\Rightarrow\) unique limit!

Bundle structure

GL(n)
P(n)
I
\Sigma_0
\Sigma_1
\dot P = P^{-1}\Sigma_0^{-1} - \Sigma_1^{-1}P + V
\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)
\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)
\dot\Sigma = S\Sigma + \Sigma S
K = P(n)

Horizontal gradient (heat) flow:

P(t)
\displaystyle F(P) = H_{\Sigma_1}(P\Sigma_0 P)

Lifted gradient flow on \(K\) for

Hessian of \(F(P)\) strictly positive on \(K\) \(\Rightarrow\) unique limit!

Bundle structure

GL(n)
P(n)
I
\Sigma_0
\Sigma_1
\dot P = P^{-1}\Sigma_0^{-1} - \Sigma_1^{-1}P + V
\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)
\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)
\dot\Sigma = S\Sigma + \Sigma S
K = P(n)

Horizontal gradient (heat) flow:

P(t)
\displaystyle F(P) = H_{\Sigma_1}(P\Sigma_0 P)

Lifted gradient flow on \(K\) for

Hessian of \(F(P)\) strictly positive on \(K\) \(\Rightarrow\) unique limit!

Bundle structure

GL(n)
P(n)
I
\Sigma_0
\Sigma_1
\dot P = P^{-1}\Sigma_0^{-1} - \Sigma_1^{-1}P + V
\mathcal G_A(\dot A,\dot A) = \mathrm{tr}(\Sigma_0 \dot A^\top \dot A)
\bar{\mathcal G}_\Sigma(\dot \Sigma,\dot \Sigma) = \mathrm{tr}(\Sigma SS)
\dot\Sigma = S\Sigma + \Sigma S
K = P(n)

Horizontal gradient (heat) flow:

P(t)
\displaystyle F(P) = H_{\Sigma_1}(P\Sigma_0 P)

Lifted gradient flow on \(K\) for

Hessian of \(F(P)\) strictly positive on \(K\) \(\Rightarrow\) unique limit!

Wasserstein-Otto vs. Fisher-Rao

\mathrm{Dens}(M)
T_\mu\mathrm{Dens}(M)\simeq C^\infty_0(M)

Wasserstein

Fisher-Rao

\displaystyle\overline{\mathcal{G}}_\mu(\dot\mu,\dot\mu) = \int_{M} \frac{\dot\mu}{\mu}\frac{\dot\mu}{\mu}\mu
\displaystyle\overline{\mathcal{G}}_{\rho dx}(\dot\rho dx,\dot\rho dx) = \int_{M} |\nabla\theta|^2\rho
\displaystyle \dot\rho + \mathrm{div}(\rho \nabla\theta) = 0

Dependent on Riemannian structure of \(M\)

Independent of Riemannian structure of \(M \Rightarrow \mathrm{Diff}(M)\)-invariance

\displaystyle \rho(x) = \frac{1}{\sqrt{(2\pi)^n\mathrm{det}(\Sigma)}}\mathrm{exp}(-\frac{1}{2}x^\top \Sigma^{-1}x)
\displaystyle \rho(x) = \sqrt{\frac{\mathrm{det}(W)}{(2\pi)^n}}\mathrm{exp}(-\frac{1}{2}x^\top W x)

Bundle structure

GL(n)
P(n)\simeq O(n,W_0^{-1})\backslash GL(n)
I
W_0
W_1
\pi(A)=A^\top W_0 A
\bar{\mathcal G}_W(\dot W,\dot W) = \mathrm{tr}(W^{-1}\dot W W^{-1}\dot W)
K

Requirement for compatible metric on \(GL(n)\) :

  • Descending, meaning
    left-invariant w.r.t. \(O(n,W_0^{-1}\) )
  • Right-invariant w.r.t. \(GL(n)\)

Bundle structure

GL(n)
P(n)\simeq O(n,W_0^{-1})\backslash GL(n)
I
W_0
W_1
\pi(A)=A^\top W_0 A
\mathcal G_I(\xi,\xi) = \mathrm{tr}( \ell(\xi)^\top\ell(\xi) + \sigma(\xi)^2)
\bar{\mathcal G}_W(\dot W,\dot W) = \mathrm{tr}(W^{-1}\dot W W^{-1}\dot W)
K

Requirement for compatible metric on \(GL(n)\) :

  • Descending, meaning
    left-invariant w.r.t. \(O(n,W_0^{-1}\) )
  • Right-invariant w.r.t. \(GL(n)\)

Bundle structure

GL(n)
P(n)\simeq O(n,W_0^{-1})\backslash GL(n)
I
W_0
W_1
\pi(A)=A^\top W_0 A
\mathcal G_I(\xi,\xi) = \mathrm{tr}( \ell(\xi)^\top\ell(\xi) + \sigma(\xi)^2)
\bar{\mathcal G}_W(\dot W,\dot W) = \mathrm{tr}(W^{-1}\dot W W^{-1}\dot W)
K

Requirement for compatible metric on \(GL(n)\) :

  • Descending, meaning
    left-invariant w.r.t. \(O(n,W_0^{-1}\) )
  • Right-invariant w.r.t. \(GL(n)\)

\(\mathrm{Hor}_I = \) upper triangular matrices

Bundle structure

GL(n)
P(n)\simeq O(n,W_0^{-1})\backslash GL(n)
I
W_0
W_1
\pi(A)=A^\top W_0 A
\mathcal G_I(\xi,\xi) = \mathrm{tr}( \ell(\xi)^\top\ell(\xi) + \sigma(\xi)^2)
\bar{\mathcal G}_W(\dot W,\dot W) = \mathrm{tr}(W^{-1}\dot W W^{-1}\dot W)
K

Requirement for compatible metric on \(GL(n)\) :

  • Descending, meaning
    left-invariant w.r.t. \(O(n,W_0^{-1}\) )
  • Right-invariant w.r.t. \(GL(n)\)

\(\mathrm{Hor}_I = \) upper triangular matrices

Lie subalgebra \(\Rightarrow\)  \(\mathrm{Hor}\) is integrable

Bundle structure

GL(n)
P(n)\simeq O(n,W_0^{-1})\backslash GL(n)
I
W_0
W_1
\pi(A)=A^\top W_0 A
\mathcal G_I(\xi,\xi) = \mathrm{tr}( \ell(\xi)^\top\ell(\xi) + \sigma(\xi)^2)
\bar{\mathcal G}_W(\dot W,\dot W) = \mathrm{tr}(W^{-1}\dot W W^{-1}\dot W)
K

Requirement for compatible metric on \(GL(n)\) :

  • Descending, meaning
    left-invariant w.r.t. \(O(n,W_0^{-1}\) )
  • Right-invariant w.r.t. \(GL(n)\)

\(\mathrm{Hor}_I = \) upper triangular matrices

Lie subalgebra \(\Rightarrow\)  \(\mathrm{Hor}\) is integrable

\(K=\) subgroup of upper triangular matrices with positive entries on diagonal

Bundle structure

GL(n)
P(n)\simeq O(n,W_0^{-1})\backslash GL(n)
I
W_0
W_1
\pi(A)=A^\top W_0 A
\mathcal G_I(\xi,\xi) = \mathrm{tr}( \ell(\xi)^\top\ell(\xi) + \sigma(\xi)^2)
\bar{\mathcal G}_W(\dot W,\dot W) = \mathrm{tr}(W^{-1}\dot W W^{-1}\dot W)
K

Requirement for compatible metric on \(GL(n)\) :

  • Descending, meaning
    left-invariant w.r.t. \(O(n,W_0^{-1}\) )
  • Right-invariant w.r.t. \(GL(n)\)

\(\mathrm{Hor}_I = \) upper triangular matrices

Lie subalgebra \(\Rightarrow\)  \(\mathrm{Hor}\) is integrable

\(K=\) subgroup of upper triangular matrices with positive entries on diagonal

A
R
Q

Bundle structure

GL(n)
P(n)\simeq O(n,W_0^{-1})\backslash GL(n)
I
W_0
W_1
\pi(A)=A^\top W_0 A
\mathcal G_I(\xi,\xi) = \mathrm{tr}( \ell(\xi)^\top\ell(\xi) + \sigma(\xi)^2)
\bar{\mathcal G}_W(\dot W,\dot W) = \mathrm{tr}(W^{-1}\dot W W^{-1}\dot W)
K

Requirement for compatible metric on \(GL(n)\) :

  • Descending, meaning
    left-invariant w.r.t. \(O(n,W_0^{-1}\) )
  • Right-invariant w.r.t. \(GL(n)\)

\(\mathrm{Hor}_I = \) upper triangular matrices

Lie subalgebra \(\Rightarrow\)  \(\mathrm{Hor}\) is integrable

\(K=\) subgroup of upper triangular matrices with positive entries on diagonal

A
R
Q

Bundle structure

GL(n)
P(n)\simeq O(n,W_0^{-1})\backslash GL(n)
I
W_0
W_1
\pi(A)=A^\top W_0 A
\mathcal G_I(\xi,\xi) = \mathrm{tr}( \ell(\xi)^\top\ell(\xi) + \sigma(\xi)^2)
\bar{\mathcal G}_W(\dot W,\dot W) = \mathrm{tr}(W^{-1}\dot W W^{-1}\dot W)
K

Requirement for compatible metric on \(GL(n)\) :

  • Descending, meaning
    left-invariant w.r.t. \(O(n,W_0^{-1}\) )
  • Right-invariant w.r.t. \(GL(n)\)

\(\mathrm{Hor}_I = \) upper triangular matrices

Lie subalgebra \(\Rightarrow\)  \(\mathrm{Hor}\) is integrable

\(K=\) subgroup of upper triangular matrices with positive entries on diagonal

A =
R
Q

Bundle structure

GL(n)
P(n)\simeq O(n,W_0^{-1})\backslash GL(n)
I
W_0
W_1
\pi(A)=A^\top W_0 A
\mathcal G_I(\xi,\xi) = \mathrm{tr}( \ell(\xi)^\top\ell(\xi) + \sigma(\xi)^2)
\bar{\mathcal G}_W(\dot W,\dot W) = \mathrm{tr}(W^{-1}\dot W W^{-1}\dot W)
K

Requirement for compatible metric on \(GL(n)\) :

  • Descending, meaning
    left-invariant w.r.t. \(O(n,W_0^{-1}\) )
  • Right-invariant w.r.t. \(GL(n)\)

\(\mathrm{Hor}_I = \) upper triangular matrices

Lie subalgebra \(\Rightarrow\)  \(\mathrm{Hor}\) is integrable

\(K=\) subgroup of upper triangular matrices with positive entries on diagonal

W_1 = \pi(R) = R^\top W_0 R
R

Entropy gradient flow (Fisher-Rao)

GL(n)
P(n)
I
W_0
W_1
W(t)
\displaystyle H_{W_1}(\Sigma) = -\frac{1}{2}\mathrm{tr}(W_1 W^{-1}) + \frac{1}{2}\log\det(W_1 W^{-1})

Relative entropy

\dot W = W_1 - W
\dot W = \nabla_{\bar{\mathcal G}}H_{W_0}(W)

Further reduction

GL(n)
I
I
W_1
P(n)\simeq O(n)\backslash GL(n)
P(n)/O(n,)\simeq \mathrm{poly}^+_n
\pi(A) = A^\top A
\varpi(P) = \det(\lambda I - W)

Right action by \((Q,W)\mapsto Q^\top W Q \), clearly not free!

Spectral decomposition

P(n)
I
W
P(n)/O(n,)\simeq \mathrm{poly}^+_n
D(n)

Geodesic equation on \(D(n)\):

\(D(n)\) is totally geodesic submanifold!

\displaystyle \ddot\gamma_i - \frac{\dot\gamma_i^2}{\gamma_i} = 0

Notice: \(D(n)\) intersects \(W_1\)-orbit \(n!\) times
( \(D(n)\) an \(n!\)-covering of \(\mathrm{poly}_n^+\) )

\Lambda

Spectral decomposition

P(n)
I
W =
P(n)/O(n,)\simeq \mathrm{poly}^+_n
D(n)

Geodesic equation on \(D(n)\):

\(D(n)\) is totally geodesic submanifold!

\displaystyle \ddot\gamma_i - \frac{\dot\gamma_i^2}{\gamma_i} = 0

Notice: \(D(n)\) intersects \(W_1\)-orbit \(n!\) times
( \(D(n)\) an \(n!\)-covering of \(\mathrm{poly}_n^+\) )

Q\Lambda Q^\top

Brockett flow

P(n)
N
W_1
D(n)

\(H_N(W)\) relative entropy functional

 

Functional \(F(Q) = H_N(Q^\top W_1 Q)\) on \(O(n)\)

\mathrm{Orb}(W_1)

THANKS!

Reference:

  • K. Modin
    Geometry of Matrix Decompositions Seen Through Optimal Transport and Information Geometry, 2017

Slides available at: slides.com/kmodin

Geometry of matrix decompositions seen through optimal transport and information geometry

By Klas Modin

Geometry of matrix decompositions seen through optimal transport and information geometry

Online-presentation given 2020-12 in the Hamiltonian Seminar Series, University of Toronto.

  • 1,623