Machine Learning in Lattice QCD

Sam Foreman

02/10/2020

Introduction

LatticeQCD:
- Non-perturbative approach to solving the QCD theory of the strong interaction between quarks and gluons
Calculations in LatticeQCD proceed in 3 steps:

Gauge field generation: Use Markov Chain Monte Carlo (MCMC) methods for sampling independent gauge field (gluon) configurations.

Propagator calculations: Compute how quarks propagate in these fields ("quark propagators")

Contractions: Method for combining quark propagators into correlation functions and observables.

Motivation: Lattice QCD

Generating independent gauge configurations is a MAJOR bottleneck for LatticeQCD.

As the lattice spacing, \(a \rightarrow 0\), the MCMC updates tend to get stuck in sectors of fixed gauge topology.
- This causes the number of steps needed to adequately sample different topological sectors to increase exponentially.

Critical slowing down!

\(a\)

Continuum limit

Markov Chain Monte Carlo (MCMC)

Goal: Generate an ensemble of independent samples drawn from the desired target distribution \(p(x)\).

This is done using the Metropolis-Hastings accept/reject algorithm:

Given:
- Initial distribution, \(\pi_{0}\)
- Proposal distribution, \(q(x^{\prime}|x)\)
Update:
1. Sample \(x^{\prime} \sim q(\cdot | x)\)
2. Accept \(x^{\prime}\) with probability \(A(x^{\prime}|x)\)

A(x^{\prime}|x) = \min\left[1, \frac{p(x^{\prime})q(x|x^{\prime})}{p(x)q(x^{\prime}|x)}\right] = \min\left[1, \frac{p(x^{\prime})}{p(x)}\right]

if \(q(x^{\prime}|x) = q(x|x^{\prime})\)

\longrightarrow

Metropolis-Hastings: Accept/Reject

import numpy as np

def metropolis_hastings(p, steps=1000):
    x = 0.                   # initialize config
    samples = np.zeros(steps)
    for i in range(steps):
        x_prime = x + np.random.randn()           # proposed config
        if np.random.rand() < p(x_prime) / p(x):  # compute A(x'|x)
            x = x_prime      # accept proposed config
        samples[i] = x       # accumulate configs 
    
    return samples

N \longrightarrow \infty

x\sim p

Issues with MCMC

Need to wait for the chain to "burn in" (become thermalized)

Nearby configurations on the chain are correlated with each other.
- Multiple steps needed to produce independent samples ("mixing time")
  - Measurable via integrated autocorrelation time, \(\tau^{\mathrm{int}}_{\mathcal{O}}\)

Smaller \(\tau^{\mathrm{int}}_{\mathcal{O}}\longrightarrow\) less computational cost!

correlated!

burn-in

\underbrace{\hspace96px}

\sim p(x)

\overbrace{\hspace 72px}

Hamiltonian Monte Carlo (HMC)

Target distribution \(p(x)\) defined by an energy function \(U(x)\) such that \(p(x) \propto \exp{\left(-U(x)\right)}\)

Introduce a (fictitious) momentum variable \(v\) that is (normally) distributed independently from \(x\) as \(p(v)\propto \exp{\left(-\frac{1}{2}v^{T}v\right)}\)

HMC samples from the canonical distribution:

p(x, v) = \frac{1}{\mathcal{Z}}\exp{\left(-\mathcal{H}(x, v)\right)} )

\propto \exp{\left(-U(x) - \frac{1}{2}v^{T}v\right)} \equiv p(x) p(v)

We can improve on the "guess and check" approach of MCMC by using Hamiltonian Monte Carlo (HMC).

We know how this "evolves" in time!

HMC: Leapfrog Integrator

Integrate Hamilton's equations numerically using the leapfrog integrator.

The leapfrog integrator proceeds in three steps:

\dot x_{i} = \frac{\partial\mathcal{H}}{\partial v_{i}} = v_i

\dot v_{i} =-\frac{\partial\mathcal{H}}{\partial x_{i}} = -\frac{\partial U}{\partial x_{i}}

Update momenta (half step):

Update position (full step):

Update momenta (half step):

(1.)

(2.)

(3.)

(t \longrightarrow t + \frac{\varepsilon}{2})

(t \longrightarrow t + \varepsilon)

(t + \frac{\varepsilon}{2} \longrightarrow t + \varepsilon)

v(t + \frac{\varepsilon}{2}) = v(t) - \frac{\varepsilon}{2}\partial_{x} U(x(t))

x(t + \varepsilon) = x(t) + \varepsilon v(t+\frac{\varepsilon}{2})

v(t+\varepsilon) = v(t+\frac{\varepsilon}{2}) - \frac{\varepsilon}{2}\partial_{x}U(x(t+\varepsilon))

HMC: Leapfrog Integrator

Write the action of the leapfrog integrator in terms of an operator \(L\), acting on the state \(\xi \equiv (x, v)\):

\mathbf{L}\xi \equiv \mathbf{L}(x, v)\equiv (x^{\prime}, v^{\prime}) = \xi^{\prime}

\mathbf{F}\xi \equiv \mathbf{F}(x, v) \equiv (x, -v)

The acceptance probability is then given by:

Introduce a "momentum-flip" operator, \(\mathbf{F}\):

A\left(\mathbf{FL}\xi|\xi\right) = \min\left(1, \frac{p(\mathbf{FL}\xi)}{p(\xi)}\left|\frac{\partial\left[\mathbf{FL}\xi\right]}{\partial\xi^{T}}\right|\right)

(for HMC)

=\min\left(1, \exp\left[-\mathcal{H}(\xi^{\prime}) + \mathcal{H}(\xi)\right]\right)

Jacobian determinant, \(|\mathcal{J}|\)

Hamiltonian Monte Carlo (HMC)

Integrating Hamilton's equations allows us to move far in state space while staying (roughly) on iso-probability contours of \(p(x, v)\)

Integrate \(H(x, v)\):

\(t \longrightarrow t + \varepsilon\)

Project onto target parameter space \(p(x, v) \longrightarrow p(x)\)

\(v \sim p(v)\)

HMC: Issues

Cannot easily traverse low-density zones.

What do we want in a good sampler?

Fast mixing
Fast burn-in

Mix across energy levels
Mix between modes

Energy levels selected randomly \(\longrightarrow\) slow mixing!

(especially for Lattice QCD)

L2HMC: Learning to HMC

L2HMC generalizes HMC by introducing 6 new functions, \(S_{\ell}, T_{\ell}, Q_{\ell}\), for \(\ell = x, v\) into the leapfrog integrator.

Given an analytically described distribution, L2HMC provides a statistically exact sampler, with highly desirable properties:

Fast burn-in.
Fast mixing.

Ideal for lattice QCD due to critical slowing down!

Idea: MINIMIZE the autocorrelation time (time needed for samples to be independent).
- Can be done by MAXIMIZING the "distance" traveled by the integrator.

L2HMC: Augmented Leapfrog

v^{\prime} = v\odot\exp\left(\frac{\varepsilon}{2}S_{v}(\zeta_{1})\right) - \frac{\varepsilon}{2}\left[\partial_{x}U(x)\odot\exp(\varepsilon Q_{v}(\zeta_{1})) + T_{v}(\zeta_{1})\right]

x^{\prime} = x_{\bar{m}^{t}} + m^{t}\odot\left[x\odot \exp(\varepsilon S_{x}(\zeta_{2})) + \varepsilon\left(v^{\prime} \odot\exp(\varepsilon Q_{x}(\zeta_{2})) + T_{x}(\zeta_{2})\right)\right]

x^{\prime\prime} = x_{\bar{m}^{t}}^{\prime} + \bar{m}^{t}\odot\left[x^{\prime}\odot \exp(\varepsilon S_{x}(\zeta_{3})) + \varepsilon\left(v^{\prime} \odot\exp(\varepsilon Q_{x}(\zeta_{3})) + T_{x}(\zeta_{3})\right)\right]

v^{\prime\prime} = v^{\prime}\odot\exp\left(\frac{\varepsilon}{2}S_{v}(\zeta_{4})\right) - \frac{\varepsilon}{2}\left[\partial_{x}U(x^{\prime\prime})\odot\exp(\varepsilon Q_{v}(\zeta_{4})) + T_{v}(\zeta_{4})\right]

\zeta_{1} = (x, \partial_{x}U(x), t)

\zeta_{2} = (x_{\bar{m}^{t}}, v, t)

\zeta_{3} = (x^{\prime}_{m^{t}}, v, t)

\zeta_{4} = (x^{\prime\prime}, \partial_{x}U(x^{\prime\prime}), t)

\overbrace{\hspace{31px}}

Momentum scaling

\overbrace{\hspace{31px}}

Gradient scaling

\overbrace{\hspace{13px}}

Translation

inputs

Idea: Generalize HMC by introducing six new functions:
- \(S_{x}(\theta_{x}),\,T_{x}(\theta_{x}),\,Q_{x}(\theta_{x})\); \(\quad\) \(S_{v}(\theta_{v}),\,T_{v}(\theta_{v}),\,Q_{v}(\theta_{v})\)

L2HMC: Modified Leapfrog

Writing the action of the new leapfrog integrator as an operator \(\mathbf{L}_{\theta}\), parameterized by \(\theta\).

Applying this operator \(M\) times successively to \(\xi\):

\mathbf{L}_{\theta} = \mathbf{L}_{\theta}(x, v, d) = \left(x^{{\prime\prime}^{\times M}}, v^{{\prime\prime}^{\times M}}, d\right)

The "flip" operator \(\mathbf{F}\) reverses \(d\): \(\mathbf{F}\xi = (x, v, -d)\).

Write the complete dynamics step as:

\mathbf{FL}_{\theta} \xi = \xi^{\prime}

(trajectory length)

L2HMC: Accept/Reject

\(\mathcal{J}\) can be computed efficiently!
- Only depends on: \(S_{x}, S_{v}, \zeta_{i}\).

This has the effect of deforming the energy landscape:

\mathcal{J}

Accept the proposed configuration, \(\xi^{\prime}\) with probability:

\(A(\xi^{\prime}|\xi) = \min{\left(1, \frac{p(\mathbf{FL}_{\theta}\xi)}{p(\xi)}\left|\frac{\partial\left[\mathbf{FL}\xi\right]}{\partial\xi^{T}}\right|\right)}\)

\(|\mathcal{J}| \neq 1\)

Unlike HMC,

L2HMC: Loss function

Choose a loss designed to reduce mixing (autocorrelation) time \(\tau_{\mathcal{O}}^{\mathrm{int}}\):
- Idea: minimize the autocorrelation time by maximizing the "distance" traveled during integration.

\ell_{\lambda}\left(\xi, \xi^{\prime}, A(\xi^{\prime}|\xi)\right) = \frac{\lambda^{2}}{\delta(\xi, \xi^{\prime})A(\xi^{\prime}|\xi)} - \frac{\delta(\xi, \xi^{\prime})A(\xi^{\prime}|\xi)}{\lambda^{2}}

\underbrace{\hspace{38px}}

Encourages typical moves to be large

\overbrace{\hspace{38px}}

Penalizes sampler if unable to move effectively

scale parameter

"distance" between \(\xi, \xi^{\prime}\): \(\delta(\xi, \xi^{\prime}) = \|x - x^{\prime}\|^{2}_{2}\)

Note:

\(\delta \times A = \) "expected" distance

Accept prob.

Network Architecture

Build model,

initialize network

Run dynamics, Accept/Reject

Calculate

Backpropagate

Finished

training?

Save trained

model

Run inference

on saved model

\ell_{\lambda}(\theta)

Train step

L2HMC

HMC

GMM: Autocorrelation

L2HMC: \(U(1)\) Lattice Gauge Theory

U_{\mu}(i) = e^{i \phi_{\mu}(i)} \in U(1)

-\pi < \phi_{\mu}(i) \leq \pi

\beta S = \beta \sum_{P}\left(1 - \cos(\phi_{P})\right)

Wilson action:

\phi_{P} \equiv \phi_{\mu\nu}(i)\\

= \phi_{\mu}(i) + \phi_{\nu}(i + \hat{\mu})- \phi_{\mu}(i+\hat{\nu}) - \phi_{\nu}(i)

where:

Link variables:

Sum of \(\phi\) around plaquette

\(U(1)\) Lattice Gauge Theory

Good sampling

Poor sampling

L2HMC

HMC

Thanks for listening!

Code is available at: http://github.com/saforem2/l2hmc-qcd

Slides are available at: http://slides.com/samforeman/l2hmc-qcd

Machine Learning in Lattice QCD

Sam Foreman

Introduction

LatticeQCD:

Calculations in LatticeQCD proceed in 3 steps:

Motivation: Lattice QCD

Critical slowing down!

Issues with MCMC

Smaller \(\tau^{\mathrm{int}}_{\mathcal{O}}\longrightarrow\) less computational cost!

Hamiltonian Monte Carlo (HMC)

HMC: Leapfrog Integrator

Update momenta (half step):

Update position (full step):

Update momenta (half step):

HMC: Leapfrog Integrator

Hamiltonian Monte Carlo (HMC)

HMC: Issues

L2HMC: Learning to HMC

L2HMC: Augmented Leapfrog

L2HMC: Modified Leapfrog

L2HMC: Accept/Reject

L2HMC: Loss function

Network Architecture

L2HMC

HMC

GMM: Autocorrelation

L2HMC: \(U(1)\) Lattice Gauge Theory

\(U(1)\) Lattice Gauge Theory

Good sampling

Poor sampling

L2HMC

HMC

Thanks for listening!