Dynamical formulation of the Sinkhorn algorithm

Klas Modin

Overview

Sinkhorn
algorithm

Geometric hydrodynamics

Heat flow
Otto's calculus

Quantum mechanics

Discrete optimal transport

\(p,q \in \mathbb{R}_+^N\) with \(\sum_i p_i=\sum_i q_i = 1\)

find coupling matrix \(\gamma\in \mathbb{R}_+^{N\times N} \) minimizing

\displaystyle E(\gamma) = \langle C,\gamma \rangle = \sum_{ij} C_{ij}\gamma_{ij}

under the constraint

\displaystyle \sum_i\gamma_{ij} = p_j \qquad \sum_j\gamma_{ij} = q_i

Sinkhorn algorithm

Relative entropy \(\mathcal H(\gamma\mid K) = \sum_{ij}\gamma_{ij}\log(\gamma_{ij}/K_{ij})\)

Entropic regularization: find coupling matrix \(\gamma\in \mathbb{R}_+^{N\times N} \) minimizing

\displaystyle E(\gamma) = \langle C,\gamma \rangle + \varepsilon\mathcal H(\gamma\mid 1)

under the same constraint

\displaystyle = \varepsilon\mathcal H(\gamma\mid K)

K_{ij} = \mathrm{e}^{-C_{ij}/\varepsilon}

\gamma = D_a K D_b

\displaystyle a_{k+1} = \frac{q}{K^\top b_k}, \qquad b_k = \frac{p}{K a_k}

(Sinkhorn's theorem)

Dynamical formulation of OT

\dot S + \frac{1}{2}|\nabla S|^2 = 0

\displaystyle\dot\rho + \mathrm{div}(\rho\nabla S) = 0

[Benamou & Brenier 2000]

Idea: transport map obtained through compressible fluid

Lagrangian for density \(\rho(t,x)\) and vector field \(v(t,x)\)

under constraints \(\rho(0,\cdot)=\rho_0,\; \rho(1,\cdot)=\rho_1,\; \)

L(\rho ,v) =\int_{[0,1]\times M} \frac{1}{2}|v|^2\rho

Wasserstein-Otto Riemannian metric on \(\mathrm{Diff}(M)\)

\(\dot\rho + \mathrm{div}(\rho v) = 0\)

H(\rho ,m) =\int_{[0,1]\times M} \frac{1}{2\rho}|m|^2

\(\dot\rho + \mathrm{div}(m) = 0\)

Hamiltonian formulation is convex

Hamilton's equations for horizontal momentum \(m=\rho\nabla S\)

\rho \in \mathrm{Dens}(M) = \{\rho\mid \int_M\rho = 1, \rho>0 \}

[S] \in T_\rho^*\mathrm{Dens}(M) = \{[S]\mid [S] = S + \mathbb R \}

transport equation for \(v=\nabla S\)

Hamilton-Jacobi equation

Summary:

\(L^2\) optimal transport corresponds to BVP for Hamiltonian system on \(T^*\mathrm{Dens}(M)\)

\dot S + \frac{1}{2}|\nabla S|^2 = 0

\displaystyle\dot\rho + \mathrm{div}(\rho\nabla S) = 0

with bc \(\rho(0,\cdot) = \rho_0\) and \(\rho(1,\cdot) = \rho_1\)

Hamiltonian is \(H(\rho,S) = \int_M \frac{1}{2}|\nabla S|^2\rho\)

Aim:

Dynamical formulation corresponding to the Sinkhorn algorihm

\(\Rightarrow\) Sinkhorn algorithm as space and time discretization of inf. dim. flow equation

Madelung transform

\(L^2\) OT Hamiltonian

T^*\mathrm{Dens}(M) \ni (\rho,[S]) \mapsto \underbrace{\sqrt{\rho}\mathrm{e}^{\mathrm{i}S/\hbar}}_{\psi} \in P C^\infty(M,\mathbb{C}\backslash\{0\})

Theorem: Madelung transform

is a symplectomorphism

\displaystyle H(\rho,[S]) = \frac{1}{2}\int_M |\nabla S|^2\rho

\displaystyle + \frac{\hbar^2}{8}\int_M \frac{|\nabla \rho|^2}{\rho}

perturbed by Fisher information potential

... gives Schrödinger Hamiltonian \(H(\psi) = \frac{\hbar^2}{2} \Vert \nabla\psi \Vert^2\)

\Rightarrow \quad\mathrm{i}\hbar\dot\psi = -\frac{\hbar^2}{2}\Delta\psi

Imag. Planck's constant \(\hbar = \pm2\mathrm{i}\epsilon\)

\(L^2\) OT Hamiltonian

T^*\mathrm{Dens}(M) \ni (\rho,[S]) \mapsto \Big( \underbrace{\sqrt{\rho\mathrm{e}^{-S/\epsilon}}}_{\psi_{+}}, \underbrace{\sqrt{\rho\mathrm{e}^{S/\epsilon}}}_{\psi_{-}} \Big)

Theorem: Madelung-Hopf-Cole transform

\displaystyle H(\rho,[S]) = \frac{1}{2}\int_M |\nabla S|^2\rho

\displaystyle - \frac{\epsilon^2}{2}\int_M \frac{|\nabla \rho|^2}{\rho}

perturbed by Fisher information potential

\Rightarrow \quad\dot\psi_{\pm} = \pm\epsilon\Delta\psi_{\pm}

Heat flow forward and

backward in time!

Important observations: \(\psi_+\psi_- = \rho\) and \(\epsilon\nabla\log(\psi_-/\psi_+) = \nabla S\)

is a symplectomorphism w.r.t. \(D\psi_+\wedge D\psi_-\)

Perturbed dynamical OT

Minimize perturbed (but still convex) functional

under constraints \(\rho(0,\cdot)=\rho_0,\; \rho(1,\cdot)=\rho_1,\; \)

J(\rho ,m) =\frac{1}{2}\int_{[0,1]\times M} \frac{1}{\rho}\Big( |m|^2 + \epsilon^2 |\nabla \rho|^2\Big)

\(\dot\rho + \mathrm{div}(m) = 0\)

becomes "double heat flow" equation (with \(\bar\psi_-(t,\cdot) = \psi_-(1-t,\cdot)\))

\[\dot\psi_+ = \epsilon\Delta\psi_+, \quad \dot{\bar\psi}_- = \epsilon\Delta\bar\psi_- \]

coupled by bc \(\psi_+(0,\cdot)\bar\psi_-(1,\cdot) = \rho_0\) and \(\psi_+(1,\cdot)\bar\psi_-(0,\cdot) = \rho_1\)

Algebraic formulation in terms of heat kernel \(\mathrm{e}^{\epsilon\Delta}\)

with \(a=\psi_+(0,\cdot)\) and \(b=\psi_-(1,\cdot)\)

\( a\mathrm{e}^{\epsilon\Delta}b = \rho_0,\quad b\mathrm{e}^{\epsilon\Delta}a = \rho_1\)

\( a_{k+1}\mathrm{e}^{\epsilon\Delta}b_k = \rho_0,\quad b_{k+1}\mathrm{e}^{\epsilon\Delta}a_{k+1} = \rho_1\)

Fixed-point iteration over \(a\) \(\Rightarrow\) Sinkhorn algorithm

Minimization problem for \(\psi_+\), \(\psi_-\)

Convex functional on \(C^\infty([0,1]\times M,\mathbb{R}_+)^3\)

H(\psi_+\psi_- | \rho) = \int_{[0,1]\times M} \psi_+\psi_- \log\Big(\frac{\psi_+\psi_-}{\rho}\Big)

(entropy of \(\psi_+\psi_-\) relative to \(\rho\))

under linear constraints \(\dot\psi_{\pm} = \pm\epsilon\Delta\psi_\pm\)

Leads to integral equation

\displaystyle\dot a = -a \log\left( \frac{a \mathrm{e}^{\varepsilon\Delta}b}{\rho_0} \right)

\displaystyle\dot b = -b \log\left( \frac{b \mathrm{e}^{\varepsilon\Delta}a}{\rho_1} \right)

\alpha = \log a, \quad \beta = \log b

Change of variables

\displaystyle\dot \alpha = -\alpha - \log \left( \mathrm{e}^{\varepsilon\Delta}\mathrm{e}^{\beta} \right) + \log\rho_0

\displaystyle\dot \beta = -\beta - \log \left( \mathrm{e}^{\varepsilon\Delta}\mathrm{e}^{\alpha} \right) + \log\rho_1

Well-posedness

\displaystyle\dot \alpha = -\alpha - \log \left( \mathrm{e}^{\varepsilon\Delta}\mathrm{e}^{\beta} \right) + \log\rho_0

\displaystyle\dot \beta = -\beta - \log \left( \mathrm{e}^{\varepsilon\Delta}\mathrm{e}^{\alpha} \right) + \log\rho_1

Observation: smooth ODE on Banach space \(X\)

For example (\(M\) compact):

\(X=H^s(M)\) with \(s>d/2\)
\(X=L^\infty(M)\)

Consequence: local well-posedness automatic (Picard iterations)

Conjecture: global existence due to maximum principle

Apply splitting + forward Euler

\displaystyle a_{k+1} = \frac{\rho_0^h a_k^{1-h}}{(\mathrm{e}^{\epsilon\Delta}b_k)^h }

\displaystyle b_{k+1} = \frac{\rho_1^h b_k^{1-h}}{(\mathrm{e}^{\epsilon\Delta}a_{k+1})^h }

\displaystyle\dot \beta = -\beta - \log \left( \mathrm{e}^{\varepsilon\Delta}\mathrm{e}^{\alpha} \right) + \log\rho_1

\displaystyle\dot \alpha = -\alpha - \log \left( \mathrm{e}^{\varepsilon\Delta}\mathrm{e}^{\beta} \right) + \log\rho_0

\displaystyle \alpha_{k+1} = (1-h)\alpha_k - h\log \left( \mathrm{e}^{\varepsilon\Delta}\mathrm{e}^{\beta_k} \right) + h\log\rho_0

\displaystyle \beta_{k+1} = (1-h)\beta_k - h\log \left( \mathrm{e}^{\varepsilon\Delta}\mathrm{e}^{\alpha_{k+1}} \right) + h\log\rho_1

Stepsize \(h>0\)

Original variables \(a,b\) and \(h=1\) gives

\(h=1\) gives Sinkhorn before spatial discretization!

Example on \(T^2\)

I use \(\epsilon = 0.01\)

\rho_0

\rho_1

Example on \(T^2\)

I use \(\epsilon = 0.01\)

\rho_0

\rho_1

Results

\(t=0\)

\(t=1\)

Results

\(t=0\)

\(t=1\)

Transport map is a diffeomorphism

\(L_\infty\)-error

Can we understand this?

101 on linear stability for Euler's method

\dot y = \lambda y

( 1+h\lambda )

y_{k+1} = \quad\qquad\;\; y_k

Test equation

Euler's method

101 on linear stability for Euler's method

\dot y = \lambda y

\lvert 1+h\lambda \rvert

y_{k+1} = \quad\qquad\;\; y_k

Test equation

Euler's method

h\lambda

101 on linear stability for Euler's method

\dot y = \begin{bmatrix}-1 & -1 \\ 0 & 0 \end{bmatrix} y + \begin{bmatrix}\log\rho_0 \\ 0 \end{bmatrix}

\lvert 1+h\lambda \rvert

y_{k+1} = \quad\qquad\;\; y_k

Sinkhorn equation with \(\epsilon=0\) (first part in splitting)

Euler's method

h\lambda

By Klas Modin

Dynamic formulation of the Sinkhorn algorithm for optimal transport

Presentation given 2019-06 in Duck Creek Village, Utah.

2,173

Klas Modin PRO

Mathematician at Chalmers University of Technology and the University of Gothenburg

klasmodin.github.io

Dynamical formulation of the Sinkhorn algorithm

Klas Modin

Overview

Discrete optimal transport

Sinkhorn algorithm

Dynamical formulation of OT

Summary:

Aim:

Madelung transform

Imag. Planck's constant \(\hbar = \pm2\mathrm{i}\epsilon\)

Perturbed dynamical OT

Minimization problem for \(\psi_+\), \(\psi_-\)

Leads to integral equation

Well-posedness

Apply splitting + forward Euler

Example on \(T^2\)

Example on \(T^2\)

Results

Results

\(L_\infty\)-error

\(L_\infty\)-error

\(L_\infty\)-error

\(L_\infty\)-error

\(L_\infty\)-error

101 on linear stability for Euler's method

101 on linear stability for Euler's method

101 on linear stability for Euler's method

101 on linear stability for Euler's method

101 on linear stability for Euler's method

101 on linear stability for Euler's method

101 on linear stability for Euler's method

101 on linear stability for Euler's method

101 on linear stability for Euler's method

Dynamic formulation of the Sinkhorn algorithm for optimal transport

More from Klas Modin