02/10/2020

# Introduction

• ### LatticeQCD:

• Non-perturbative approach to solving the QCD theory of the strong interaction between quarks and gluons
• ### Calculations in LatticeQCD proceed in 3 steps:

• Gauge field generation: Use Markov Chain Monte Carlo (MCMC) methods for sampling independent gauge field (gluon) configurations.
• ​Propagator calculations: Compute how quarks propagate in these fields ("quark propagators")
• Contractions: Method for combining quark propagators into correlation functions and observables.

# Motivation: Lattice QCD

• Generating independent gauge configurations is a MAJOR bottleneck for LatticeQCD.
• As the lattice spacing, $$a \rightarrow 0$$, the MCMC updates tend to get stuck in sectors of fixed gauge topology.
• This causes the number of steps needed to adequately sample different topological sectors to increase exponentially.

### Critical slowing down!

$$a$$

Continuum limit

Markov Chain Monte Carlo (MCMC)

• Goal: Generate an ensemble of independent samples drawn from the desired target distribution $$p(x)$$.
• This is done using the Metropolis-Hastings accept/reject algorithm:
• Given:
• Initial distribution, $$\pi_{0}$$
• Proposal distribution, $$q(x^{\prime}|x)$$
• Update:
1. Sample $$x^{\prime} \sim q(\cdot | x)$$
2. Accept $$x^{\prime}$$ with probability $$A(x^{\prime}|x)$$
A(x^{\prime}|x) = \min\left[1, \frac{p(x^{\prime})q(x|x^{\prime})}{p(x)q(x^{\prime}|x)}\right] = \min\left[1, \frac{p(x^{\prime})}{p(x)}\right]

if $$q(x^{\prime}|x) = q(x|x^{\prime})$$

\longrightarrow

Metropolis-Hastings: Accept/Reject

import numpy as np

def metropolis_hastings(p, steps=1000):
x = 0.                   # initialize config
samples = np.zeros(steps)
for i in range(steps):
x_prime = x + np.random.randn()           # proposed config
if np.random.rand() < p(x_prime) / p(x):  # compute A(x'|x)
x = x_prime      # accept proposed config
samples[i] = x       # accumulate configs

return samples
N \longrightarrow \infty

As

x\sim p

,

# Issues with MCMC

• Need to wait for the chain to "burn in" (become thermalized)
• Nearby configurations on the chain are correlated with each other.
• Multiple steps needed to produce independent samples ("mixing time")
• Measurable via integrated autocorrelation time, $$\tau^{\mathrm{int}}_{\mathcal{O}}$$

### Smaller $$\tau^{\mathrm{int}}_{\mathcal{O}}\longrightarrow$$ less computational cost!

correlated!

burn-in

\underbrace{\hspace96px}
\sim p(x)
\sim p(x)
\overbrace{\hspace 72px}

# Hamiltonian Monte Carlo (HMC)

• Target distribution $$p(x)$$ defined by an energy function $$U(x)$$ such that $$p(x) \propto \exp{\left(-U(x)\right)}$$
• Introduce a (fictitious) momentum variable $$v$$ that is (normally) distributed independently from $$x$$ as $$p(v)\propto \exp{\left(-\frac{1}{2}v^{T}v\right)}$$
• HMC  samples from the canonical distribution:
p(x, v) = \frac{1}{\mathcal{Z}}\exp{\left(-\mathcal{H}(x, v)\right)} )
\propto \exp{\left(-U(x) - \frac{1}{2}v^{T}v\right)} \equiv p(x) p(v)
• We can improve on the "guess and check" approach of MCMC by using Hamiltonian Monte Carlo (HMC).

We know how this "evolves" in time!

# HMC: Leapfrog Integrator

• Integrate Hamilton's equations numerically using the leapfrog integrator.
• The leapfrog integrator proceeds in three steps:
\dot x_{i} = \frac{\partial\mathcal{H}}{\partial v_{i}} = v_i
\dot v_{i} =-\frac{\partial\mathcal{H}}{\partial x_{i}} = -\frac{\partial U}{\partial x_{i}}

### Update momenta (half step):

(1.)
(2.)
(3.)
(t \longrightarrow t + \frac{\varepsilon}{2})
(t \longrightarrow t + \varepsilon)
(t + \frac{\varepsilon}{2} \longrightarrow t + \varepsilon)
v(t + \frac{\varepsilon}{2}) = v(t) - \frac{\varepsilon}{2}\partial_{x} U(x(t))
x(t + \varepsilon) = x(t) + \varepsilon v(t+\frac{\varepsilon}{2})
v(t+\varepsilon) = v(t+\frac{\varepsilon}{2}) - \frac{\varepsilon}{2}\partial_{x}U(x(t+\varepsilon))

# HMC: Leapfrog Integrator

• Write the action of the leapfrog integrator in terms of an operator $$L$$, acting on the state $$\xi \equiv (x, v)$$:
\mathbf{L}\xi \equiv \mathbf{L}(x, v)\equiv (x^{\prime}, v^{\prime}) = \xi^{\prime}
\mathbf{F}\xi \equiv \mathbf{F}(x, v) \equiv (x, -v)
• The acceptance probability is then given by:
• Introduce a "momentum-flip" operator, $$\mathbf{F}$$:
A\left(\mathbf{FL}\xi|\xi\right) = \min\left(1, \frac{p(\mathbf{FL}\xi)}{p(\xi)}\left|\frac{\partial\left[\mathbf{FL}\xi\right]}{\partial\xi^{T}}\right|\right)
1

(for HMC)

=\min\left(1, \exp\left[-\mathcal{H}(\xi^{\prime}) + \mathcal{H}(\xi)\right]\right)

Jacobian determinant, $$|\mathcal{J}|$$

# Hamiltonian Monte Carlo (HMC)

• Integrating Hamilton's equations allows us to move far in state space while staying (roughly) on iso-probability contours of $$p(x, v)$$

Integrate $$H(x, v)$$:

$$t \longrightarrow t + \varepsilon$$

Project onto target parameter space $$p(x, v) \longrightarrow p(x)$$

$$v \sim p(v)$$

# HMC: Issues

• Cannot easily traverse low-density zones.

• What do we want in a good sampler?

• Fast mixing
• Fast burn-in
• Mix across energy levels
• Mix between modes
• Energy levels selected randomly $$\longrightarrow$$ slow mixing!

(especially for Lattice QCD)

# L2HMC: Learning to HMC

• L2HMC generalizes HMC by introducing 6 new functions, $$S_{\ell}, T_{\ell}, Q_{\ell}$$, for  $$\ell = x, v$$ into the leapfrog integrator.
• Given an analytically described distribution, L2HMC provides a statistically exact sampler, with highly desirable properties:
1. Fast burn-in.
2. Fast mixing.

Ideal for lattice QCD due to critical slowing down!

• Idea: MINIMIZE the autocorrelation time (time needed for samples to be independent).
• Can be done by MAXIMIZING the "distance" traveled by the integrator.

# L2HMC: Augmented Leapfrog

v^{\prime} = v\odot\exp\left(\frac{\varepsilon}{2}S_{v}(\zeta_{1})\right) - \frac{\varepsilon}{2}\left[\partial_{x}U(x)\odot\exp(\varepsilon Q_{v}(\zeta_{1})) + T_{v}(\zeta_{1})\right]
x^{\prime} = x_{\bar{m}^{t}} + m^{t}\odot\left[x\odot \exp(\varepsilon S_{x}(\zeta_{2})) + \varepsilon\left(v^{\prime} \odot\exp(\varepsilon Q_{x}(\zeta_{2})) + T_{x}(\zeta_{2})\right)\right]
x^{\prime\prime} = x_{\bar{m}^{t}}^{\prime} + \bar{m}^{t}\odot\left[x^{\prime}\odot \exp(\varepsilon S_{x}(\zeta_{3})) + \varepsilon\left(v^{\prime} \odot\exp(\varepsilon Q_{x}(\zeta_{3})) + T_{x}(\zeta_{3})\right)\right]
v^{\prime\prime} = v^{\prime}\odot\exp\left(\frac{\varepsilon}{2}S_{v}(\zeta_{4})\right) - \frac{\varepsilon}{2}\left[\partial_{x}U(x^{\prime\prime})\odot\exp(\varepsilon Q_{v}(\zeta_{4})) + T_{v}(\zeta_{4})\right]
\zeta_{1} = (x, \partial_{x}U(x), t)
\zeta_{2} = (x_{\bar{m}^{t}}, v, t)
\zeta_{3} = (x^{\prime}_{m^{t}}, v, t)
\zeta_{4} = (x^{\prime\prime}, \partial_{x}U(x^{\prime\prime}), t)
\overbrace{\hspace{31px}}

Momentum scaling

\overbrace{\hspace{31px}}

\overbrace{\hspace{13px}}

Translation

inputs

• Idea: Generalize HMC by introducing six new functions:
• $$S_{x}(\theta_{x}),\,T_{x}(\theta_{x}),\,Q_{x}(\theta_{x})$$; $$\quad$$ $$S_{v}(\theta_{v}),\,T_{v}(\theta_{v}),\,Q_{v}(\theta_{v})$$

# L2HMC: Modified Leapfrog

• Writing the action of the new leapfrog integrator as an operator $$\mathbf{L}_{\theta}$$, parameterized by $$\theta$$.
• Applying this operator $$M$$ times successively to $$\xi$$:
\mathbf{L}_{\theta} = \mathbf{L}_{\theta}(x, v, d) = \left(x^{{\prime\prime}^{\times M}}, v^{{\prime\prime}^{\times M}}, d\right)
• The "flip" operator $$\mathbf{F}$$ reverses $$d$$: $$\mathbf{F}\xi = (x, v, -d)$$.
• Write the complete dynamics step as:
\mathbf{FL}_{\theta} \xi = \xi^{\prime}

(trajectory length)

# L2HMC: Accept/Reject

• $$\mathcal{J}$$ can be computed efficiently!
• Only depends on: $$S_{x}, S_{v}, \zeta_{i}$$.
• This has the effect of deforming the energy landscape:
\mathcal{J}
• Accept the proposed configuration, $$\xi^{\prime}$$ with probability:

$$A(\xi^{\prime}|\xi) = \min{\left(1, \frac{p(\mathbf{FL}_{\theta}\xi)}{p(\xi)}\left|\frac{\partial\left[\mathbf{FL}\xi\right]}{\partial\xi^{T}}\right|\right)}$$

$$|\mathcal{J}| \neq 1$$

Unlike HMC,

# L2HMC: Loss function

• Choose a loss designed to reduce mixing (autocorrelation) time $$\tau_{\mathcal{O}}^{\mathrm{int}}$$:
• Idea: minimize the autocorrelation time by maximizing the "distance" traveled during integration.

\ell_{\lambda}\left(\xi, \xi^{\prime}, A(\xi^{\prime}|\xi)\right) = \frac{\lambda^{2}}{\delta(\xi, \xi^{\prime})A(\xi^{\prime}|\xi)} - \frac{\delta(\xi, \xi^{\prime})A(\xi^{\prime}|\xi)}{\lambda^{2}}
\underbrace{\hspace{38px}}

Encourages typical moves to be large

\overbrace{\hspace{38px}}

Penalizes sampler if unable to move effectively

scale parameter

"distance" between $$\xi, \xi^{\prime}$$: $$\delta(\xi, \xi^{\prime}) = \|x - x^{\prime}\|^{2}_{2}$$

• Note:

$$\delta \times A =$$ "expected" distance

Accept prob.

# Network Architecture

Build model,

initialize network

Run dynamics, Accept/Reject

Calculate

Backpropagate

Finished

training?

Save trained

model

Run inference

​on saved model

\ell_{\lambda}(\theta)

Train step

# L2HMC: $$U(1)$$ Lattice Gauge Theory

U_{\mu}(i) = e^{i \phi_{\mu}(i)} \in U(1)
-\pi < \phi_{\mu}(i) \leq \pi
\beta S = \beta \sum_{P}\left(1 - \cos(\phi_{P})\right)
• Wilson action:
\phi_{P} \equiv \phi_{\mu\nu}(i)\\
= \phi_{\mu}(i) + \phi_{\nu}(i + \hat{\mu})- \phi_{\mu}(i+\hat{\nu}) - \phi_{\nu}(i)

where:

Sum of $$\phi$$ around plaquette