Dynamical formulation of the Sinkhorn algorithm

Klas Modin

Overview

Sinkhorn
algorithm

Geometric hydrodynamics

Heat flow
Otto's calculus

Quantum mechanics

Discrete optimal transport

\(p,q \in \mathbb{R}_+^N\)  with  \(\sum_i p_i=\sum_i q_i = 1\)

find coupling matrix \(\gamma\in \mathbb{R}_+^{N\times N} \) minimizing

\displaystyle E(\gamma) = \langle C,\gamma \rangle = \sum_{ij} C_{ij}\gamma_{ij}

under the constraint

\displaystyle \sum_i\gamma_{ij} = p_j \qquad \sum_j\gamma_{ij} = q_i

Sinkhorn algorithm

Relative entropy \(\mathcal H(\gamma\mid K) = \sum_{ij}\gamma_{ij}\log(\gamma_{ij}/K_{ij})\)

Entropic regularization: find coupling matrix \(\gamma\in \mathbb{R}_+^{N\times N} \) minimizing

\displaystyle E(\gamma) = \langle C,\gamma \rangle + \varepsilon\mathcal H(\gamma\mid 1)

under the same constraint

\displaystyle = \varepsilon\mathcal H(\gamma\mid K)
K_{ij} = \mathrm{e}^{-C_{ij}/\varepsilon}
\gamma = D_a K D_b
\displaystyle a_{k+1} = \frac{q}{K^\top b_k}, \qquad b_k = \frac{p}{K a_k}

(Sinkhorn's theorem)

Dynamical formulation of OT

\dot S + \frac{1}{2}|\nabla S|^2 = 0
\displaystyle\dot\rho + \mathrm{div}(\rho\nabla S) = 0

[Benamou & Brenier 2000]

Idea: transport map obtained through compressible fluid

Lagrangian for density \(\rho(t,x)\) and vector field \(v(t,x)\)

 

 

under constraints \(\rho(0,\cdot)=\rho_0,\; \rho(1,\cdot)=\rho_1,\; \)

L(\rho ,v) =\int_{[0,1]\times M} \frac{1}{2}|v|^2\rho

Wasserstein-Otto Riemannian metric on \(\mathrm{Diff}(M)\)

\(\dot\rho + \mathrm{div}(\rho v) = 0\)

H(\rho ,m) =\int_{[0,1]\times M} \frac{1}{2\rho}|m|^2

\(\dot\rho + \mathrm{div}(m) = 0\)

Hamiltonian formulation is convex

Hamilton's equations for horizontal momentum \(m=\rho\nabla S\)

\rho \in \mathrm{Dens}(M) = \{\rho\mid \int_M\rho = 1, \rho>0 \}
[S] \in T_\rho^*\mathrm{Dens}(M) = \{[S]\mid [S] = S + \mathbb R \}

transport equation for \(v=\nabla S\)

Hamilton-Jacobi equation

Summary:

\(L^2\) optimal transport corresponds to BVP for Hamiltonian system on \(T^*\mathrm{Dens}(M)\)

\dot S + \frac{1}{2}|\nabla S|^2 = 0
\displaystyle\dot\rho + \mathrm{div}(\rho\nabla S) = 0

with bc \(\rho(0,\cdot) = \rho_0\) and \(\rho(1,\cdot) = \rho_1\)

Hamiltonian is \(H(\rho,S) = \int_M \frac{1}{2}|\nabla S|^2\rho\)

Aim:

Dynamical formulation corresponding to the Sinkhorn algorihm

\(\Rightarrow\) Sinkhorn algorithm as space and time discretization of inf. dim. flow equation

Madelung transform

\(L^2\) OT Hamiltonian

T^*\mathrm{Dens}(M) \ni (\rho,[S]) \mapsto \underbrace{\sqrt{\rho}\mathrm{e}^{\mathrm{i}S/\hbar}}_{\psi} \in P C^\infty(M,\mathbb{C}\backslash\{0\})

Theorem: Madelung transform

is a symplectomorphism

\displaystyle H(\rho,[S]) = \frac{1}{2}\int_M |\nabla S|^2\rho
\displaystyle + \frac{\hbar^2}{8}\int_M \frac{|\nabla \rho|^2}{\rho}

perturbed by Fisher information potential

... gives  Schrödinger Hamiltonian \(H(\psi) = \frac{\hbar^2}{2} \Vert \nabla\psi \Vert^2\)

\Rightarrow \quad\mathrm{i}\hbar\dot\psi = -\frac{\hbar^2}{2}\Delta\psi

Imag. Planck's constant \(\hbar = \pm2\mathrm{i}\epsilon\)

\(L^2\) OT Hamiltonian

T^*\mathrm{Dens}(M) \ni (\rho,[S]) \mapsto \Big( \underbrace{\sqrt{\rho\mathrm{e}^{-S/\epsilon}}}_{\psi_{+}}, \underbrace{\sqrt{\rho\mathrm{e}^{S/\epsilon}}}_{\psi_{-}} \Big)

Theorem: Madelung-Hopf-Cole transform

\displaystyle H(\rho,[S]) = \frac{1}{2}\int_M |\nabla S|^2\rho
\displaystyle - \frac{\epsilon^2}{2}\int_M \frac{|\nabla \rho|^2}{\rho}

perturbed by Fisher information potential

\Rightarrow \quad\dot\psi_{\pm} = \pm\epsilon\Delta\psi_{\pm}

Heat flow forward and

backward in time!

Important observations: \(\psi_+\psi_- = \rho\) and \(\epsilon\nabla\log(\psi_-/\psi_+) = \nabla S\)

is a symplectomorphism w.r.t. \(D\psi_+\wedge D\psi_-\)

Perturbed dynamical OT

Minimize perturbed (but still convex) functional

 

 

under constraints \(\rho(0,\cdot)=\rho_0,\; \rho(1,\cdot)=\rho_1,\; \)

J(\rho ,m) =\frac{1}{2}\int_{[0,1]\times M} \frac{1}{\rho}\Big( |m|^2 + \epsilon^2 |\nabla \rho|^2\Big)

\(\dot\rho + \mathrm{div}(m) = 0\)

becomes "double heat flow" equation (with \(\bar\psi_-(t,\cdot) = \psi_-(1-t,\cdot)\))

\[\dot\psi_+ = \epsilon\Delta\psi_+, \quad \dot{\bar\psi}_- = \epsilon\Delta\bar\psi_- \]

coupled by bc \(\psi_+(0,\cdot)\bar\psi_-(1,\cdot) = \rho_0\) and \(\psi_+(1,\cdot)\bar\psi_-(0,\cdot) = \rho_1\)

Algebraic formulation in terms of heat kernel \(\mathrm{e}^{\epsilon\Delta}\)

 

with \(a=\psi_+(0,\cdot)\) and \(b=\psi_-(1,\cdot)\)

\( a\mathrm{e}^{\epsilon\Delta}b = \rho_0,\quad b\mathrm{e}^{\epsilon\Delta}a = \rho_1\)

\( a_{k+1}\mathrm{e}^{\epsilon\Delta}b_k = \rho_0,\quad b_{k+1}\mathrm{e}^{\epsilon\Delta}a_{k+1} = \rho_1\)

Fixed-point iteration over \(a\) \(\Rightarrow\) Sinkhorn algorithm

Minimization problem for \(\psi_+\), \(\psi_-\)

Convex functional on \(C^\infty([0,1]\times M,\mathbb{R}_+)^3\)

H(\psi_+\psi_- | \rho) = \int_{[0,1]\times M} \psi_+\psi_- \log\Big(\frac{\psi_+\psi_-}{\rho}\Big)

(entropy of \(\psi_+\psi_-\) relative to \(\rho\))

under linear constraints \(\dot\psi_{\pm} = \pm\epsilon\Delta\psi_\pm\)

Leads to integral equation

\displaystyle\dot a = -a \log\left( \frac{a \mathrm{e}^{\varepsilon\Delta}b}{\rho_0} \right)
\displaystyle\dot b = -b \log\left( \frac{b \mathrm{e}^{\varepsilon\Delta}a}{\rho_1} \right)
\alpha = \log a, \quad \beta = \log b

Change of variables

\displaystyle\dot \alpha = -\alpha - \log \left( \mathrm{e}^{\varepsilon\Delta}\mathrm{e}^{\beta} \right) + \log\rho_0
\displaystyle\dot \beta = -\beta - \log \left( \mathrm{e}^{\varepsilon\Delta}\mathrm{e}^{\alpha} \right) + \log\rho_1

Well-posedness

\displaystyle\dot \alpha = -\alpha - \log \left( \mathrm{e}^{\varepsilon\Delta}\mathrm{e}^{\beta} \right) + \log\rho_0
\displaystyle\dot \beta = -\beta - \log \left( \mathrm{e}^{\varepsilon\Delta}\mathrm{e}^{\alpha} \right) + \log\rho_1

Observation: smooth ODE on Banach space \(X\)

For example (\(M\) compact):

  • \(X=H^s(M)\) with \(s>d/2\)
  • \(X=L^\infty(M)\)

Consequence: local well-posedness automatic (Picard iterations)

Conjecture: global existence due to maximum principle

Apply splitting + forward Euler

\displaystyle a_{k+1} = \frac{\rho_0^h a_k^{1-h}}{(\mathrm{e}^{\epsilon\Delta}b_k)^h }
\displaystyle b_{k+1} = \frac{\rho_1^h b_k^{1-h}}{(\mathrm{e}^{\epsilon\Delta}a_{k+1})^h }
\displaystyle\dot \beta = -\beta - \log \left( \mathrm{e}^{\varepsilon\Delta}\mathrm{e}^{\alpha} \right) + \log\rho_1
\displaystyle\dot \alpha = -\alpha - \log \left( \mathrm{e}^{\varepsilon\Delta}\mathrm{e}^{\beta} \right) + \log\rho_0
\displaystyle \alpha_{k+1} = (1-h)\alpha_k - h\log \left( \mathrm{e}^{\varepsilon\Delta}\mathrm{e}^{\beta_k} \right) + h\log\rho_0
\displaystyle \beta_{k+1} = (1-h)\beta_k - h\log \left( \mathrm{e}^{\varepsilon\Delta}\mathrm{e}^{\alpha_{k+1}} \right) + h\log\rho_1

Stepsize \(h>0\)

Original variables \(a,b\) and \(h=1\) gives

\(h=1\) gives Sinkhorn before spatial discretization!

Example on \(T^2\)

I use \(\epsilon = 0.01\)

\rho_0
\rho_1

Example on \(T^2\)

I use \(\epsilon = 0.01\)

\rho_0
\rho_1

Results

\(t=0\)

\(t=1\)

Results

\(t=0\)

\(t=1\)

Transport map is a diffeomorphism

\(L_\infty\)-error

\(L_\infty\)-error

\(L_\infty\)-error

\(L_\infty\)-error

\(L_\infty\)-error

Can we understand this?

101 on linear stability for Euler's method

\dot y = \lambda y
( 1+h\lambda )
y_{k+1} = \quad\qquad\;\; y_k

Test equation

Euler's method

101 on linear stability for Euler's method

\dot y = \lambda y
\lvert 1+h\lambda \rvert
y_{k+1} = \quad\qquad\;\; y_k

Test equation

Euler's method

h\lambda

101 on linear stability for Euler's method

\dot y = \begin{bmatrix}-1 & -1 \\ 0 & 0 \end{bmatrix} y + \begin{bmatrix}\log\rho_0 \\ 0 \end{bmatrix}
\lvert 1+h\lambda \rvert
y_{k+1} = \quad\qquad\;\; y_k

Sinkhorn equation with \(\epsilon=0\) (first part in splitting)

Euler's method

h\lambda

101 on linear stability for Euler's method

\dot y = \begin{bmatrix}-1 &0 \\ 0 &0 \end{bmatrix} y + b
\lvert 1+h\lambda \rvert
y_{k+1} = \quad\qquad\;\; y_k

Euler's method

h\lambda

Sinkhorn equation with \(\epsilon=0\) (first part in splitting)

101 on linear stability for Euler's method

\dot y = \begin{bmatrix}-1 &0 \\ 0 &0 \end{bmatrix} y + b
\lvert 1+h\lambda \rvert
y_{k+1} = \quad\qquad\;\; y_k

Sinkhorn equation with \(\epsilon=0\)

Euler's method

h\lambda

explains initial transient

101 on linear stability for Euler's method

\dot y = \begin{bmatrix}-1 &0 \\ 0 &0 \end{bmatrix} y + b
\lvert 1+h\lambda \rvert
y_{k+1} = \quad\qquad\;\; y_k

Sinkhorn equation with \(\epsilon=0\)

Euler's method

h\lambda

problem

101 on linear stability for Euler's method

\dot y = \begin{bmatrix}-1 &0 \\ 0 &-\epsilon \end{bmatrix} y + b + \cdots
\lvert 1+h\lambda \rvert
y_{k+1} = \quad\qquad\;\; y_k

Sinkhorn equation with \(\epsilon>0\)

Euler's method

h\lambda

because of maximum principle

101 on linear stability for Euler's method

\dot y = \begin{bmatrix}-1 &0 \\ 0 &-\epsilon \end{bmatrix} y + b + \cdots
\lvert 1+h\lambda \rvert
y_{k+1} = \quad\qquad\;\; y_k

Sinkhorn equation with \(\epsilon>0\)

Euler's method

h\lambda

increasing stepsize

101 on linear stability for Euler's method

\dot y = \begin{bmatrix}-1 &0 \\ 0 &-\epsilon \end{bmatrix} y + b + \cdots
\lvert 1+h\lambda \rvert
y_{k+1} = \quad\qquad\;\; y_k

Sinkhorn equation with \(\epsilon>0\)

Euler's method

h\lambda

increasing stepsize

Dynamic formulation of the Sinkhorn algorithm for optimal transport

By Klas Modin

Dynamic formulation of the Sinkhorn algorithm for optimal transport

Presentation given 2019-06 in Duck Creek Village, Utah.

  • 1,654