Machine Learning in Lattice QCD

Sam Foreman

02/10/2020

Introduction

  • LatticeQCD:

    • Non-perturbative approach to solving the QCD theory of the strong interaction between quarks and gluons
  • Calculations in LatticeQCD proceed in 3 steps:

  • Gauge field generation: Use Markov Chain Monte Carlo (MCMC) methods for sampling independent gauge field (gluon) configurations.
  • ​Propagator calculations: Compute how quarks propagate in these fields ("quark propagators")
  • Contractions: Method for combining quark propagators into correlation functions and observables.

Motivation: Lattice QCD

  • Generating independent gauge configurations is a MAJOR bottleneck for LatticeQCD.
  • As the lattice spacing, \(a \rightarrow 0\), the MCMC updates tend to get stuck in sectors of fixed gauge topology.
    • This causes the number of steps needed to adequately sample different topological sectors to increase exponentially.

Critical slowing down!

\(a\)

Continuum limit

Markov Chain Monte Carlo (MCMC)

  • Goal: Generate an ensemble of independent samples drawn from the desired target distribution \(p(x)\).
  • This is done using the Metropolis-Hastings accept/reject algorithm:
  • Given:
    • Initial distribution, \(\pi_{0}\)
    • Proposal distribution, \(q(x^{\prime}|x)\)
  • Update:
    1. Sample \(x^{\prime} \sim q(\cdot | x)\)
    2. Accept \(x^{\prime}\) with probability \(A(x^{\prime}|x)\)
A(x^{\prime}|x) = \min\left[1, \frac{p(x^{\prime})q(x|x^{\prime})}{p(x)q(x^{\prime}|x)}\right] = \min\left[1, \frac{p(x^{\prime})}{p(x)}\right]

if \(q(x^{\prime}|x) = q(x|x^{\prime})\)

\longrightarrow

Metropolis-Hastings: Accept/Reject

import numpy as np

def metropolis_hastings(p, steps=1000):
    x = 0.                   # initialize config
    samples = np.zeros(steps)
    for i in range(steps):
        x_prime = x + np.random.randn()           # proposed config
        if np.random.rand() < p(x_prime) / p(x):  # compute A(x'|x)
            x = x_prime      # accept proposed config
        samples[i] = x       # accumulate configs 
    
    return samples
N \longrightarrow \infty

As

x\sim p

,

Issues with MCMC

  • Need to wait for the chain to "burn in" (become thermalized)
  • Nearby configurations on the chain are correlated with each other.
    • Multiple steps needed to produce independent samples ("mixing time")
      • Measurable via integrated autocorrelation time, \(\tau^{\mathrm{int}}_{\mathcal{O}}\)

Smaller \(\tau^{\mathrm{int}}_{\mathcal{O}}\longrightarrow\) less computational cost!

correlated!

burn-in

\underbrace{\hspace96px}
\sim p(x)
\sim p(x)
\overbrace{\hspace 72px}

Hamiltonian Monte Carlo (HMC)

  • Target distribution \(p(x)\) defined by an energy function \(U(x)\) such that \(p(x) \propto \exp{\left(-U(x)\right)}\)
  • Introduce a (fictitious) momentum variable \(v\) that is (normally) distributed independently from \(x\) as \(p(v)\propto \exp{\left(-\frac{1}{2}v^{T}v\right)}\)
  • HMC  samples from the canonical distribution:
p(x, v) = \frac{1}{\mathcal{Z}}\exp{\left(-\mathcal{H}(x, v)\right)} )
\propto \exp{\left(-U(x) - \frac{1}{2}v^{T}v\right)} \equiv p(x) p(v)
  • We can improve on the "guess and check" approach of MCMC by using Hamiltonian Monte Carlo (HMC).

We know how this "evolves" in time!

HMC: Leapfrog Integrator

  • Integrate Hamilton's equations numerically using the leapfrog integrator.
  • The leapfrog integrator proceeds in three steps:
\dot x_{i} = \frac{\partial\mathcal{H}}{\partial v_{i}} = v_i
\dot v_{i} =-\frac{\partial\mathcal{H}}{\partial x_{i}} = -\frac{\partial U}{\partial x_{i}}

Update momenta (half step):

 

Update position (full step):

Update momenta (half step):

(1.)
(2.)
(3.)
(t \longrightarrow t + \frac{\varepsilon}{2})
(t \longrightarrow t + \varepsilon)
(t + \frac{\varepsilon}{2} \longrightarrow t + \varepsilon)
v(t + \frac{\varepsilon}{2}) = v(t) - \frac{\varepsilon}{2}\partial_{x} U(x(t))
x(t + \varepsilon) = x(t) + \varepsilon v(t+\frac{\varepsilon}{2})
v(t+\varepsilon) = v(t+\frac{\varepsilon}{2}) - \frac{\varepsilon}{2}\partial_{x}U(x(t+\varepsilon))

HMC: Leapfrog Integrator

  • Write the action of the leapfrog integrator in terms of an operator \(L\), acting on the state \(\xi \equiv (x, v)\):
\mathbf{L}\xi \equiv \mathbf{L}(x, v)\equiv (x^{\prime}, v^{\prime}) = \xi^{\prime}
\mathbf{F}\xi \equiv \mathbf{F}(x, v) \equiv (x, -v)
  • The acceptance probability is then given by: 
  • Introduce a "momentum-flip" operator, \(\mathbf{F}\):
A\left(\mathbf{FL}\xi|\xi\right) = \min\left(1, \frac{p(\mathbf{FL}\xi)}{p(\xi)}\left|\frac{\partial\left[\mathbf{FL}\xi\right]}{\partial\xi^{T}}\right|\right)
1

(for HMC)

=\min\left(1, \exp\left[-\mathcal{H}(\xi^{\prime}) + \mathcal{H}(\xi)\right]\right)

Jacobian determinant, \(|\mathcal{J}|\)

Hamiltonian Monte Carlo (HMC)

  • Integrating Hamilton's equations allows us to move far in state space while staying (roughly) on iso-probability contours of \(p(x, v)\)

Integrate \(H(x, v)\):

\(t \longrightarrow t + \varepsilon\)

Project onto target parameter space \(p(x, v) \longrightarrow p(x)\)

\(v \sim p(v)\)

HMC: Issues

  • Cannot easily traverse low-density zones.

  • What do we want in a good sampler?

  • Fast mixing
  • Fast burn-in
  • Mix across energy levels
  • Mix between modes
  • Energy levels selected randomly \(\longrightarrow\) slow mixing!

(especially for Lattice QCD)

L2HMC: Learning to HMC

  • L2HMC generalizes HMC by introducing 6 new functions, \(S_{\ell}, T_{\ell}, Q_{\ell}\), for  \(\ell = x, v\) into the leapfrog integrator.
  • Given an analytically described distribution, L2HMC provides a statistically exact sampler, with highly desirable properties:
  1. Fast burn-in.
  2. Fast mixing.

Ideal for lattice QCD due to critical slowing down!

  • Idea: MINIMIZE the autocorrelation time (time needed for samples to be independent).
    • Can be done by MAXIMIZING the "distance" traveled by the integrator.

L2HMC: Augmented Leapfrog

v^{\prime} = v\odot\exp\left(\frac{\varepsilon}{2}S_{v}(\zeta_{1})\right) - \frac{\varepsilon}{2}\left[\partial_{x}U(x)\odot\exp(\varepsilon Q_{v}(\zeta_{1})) + T_{v}(\zeta_{1})\right]
x^{\prime} = x_{\bar{m}^{t}} + m^{t}\odot\left[x\odot \exp(\varepsilon S_{x}(\zeta_{2})) + \varepsilon\left(v^{\prime} \odot\exp(\varepsilon Q_{x}(\zeta_{2})) + T_{x}(\zeta_{2})\right)\right]
x^{\prime\prime} = x_{\bar{m}^{t}}^{\prime} + \bar{m}^{t}\odot\left[x^{\prime}\odot \exp(\varepsilon S_{x}(\zeta_{3})) + \varepsilon\left(v^{\prime} \odot\exp(\varepsilon Q_{x}(\zeta_{3})) + T_{x}(\zeta_{3})\right)\right]
v^{\prime\prime} = v^{\prime}\odot\exp\left(\frac{\varepsilon}{2}S_{v}(\zeta_{4})\right) - \frac{\varepsilon}{2}\left[\partial_{x}U(x^{\prime\prime})\odot\exp(\varepsilon Q_{v}(\zeta_{4})) + T_{v}(\zeta_{4})\right]
\zeta_{1} = (x, \partial_{x}U(x), t)
\zeta_{2} = (x_{\bar{m}^{t}}, v, t)
\zeta_{3} = (x^{\prime}_{m^{t}}, v, t)
\zeta_{4} = (x^{\prime\prime}, \partial_{x}U(x^{\prime\prime}), t)
\overbrace{\hspace{31px}}

Momentum scaling

\overbrace{\hspace{31px}}

Gradient scaling

\overbrace{\hspace{13px}}

Translation

inputs

  • Idea: Generalize HMC by introducing six new functions:
    • \(S_{x}(\theta_{x}),\,T_{x}(\theta_{x}),\,Q_{x}(\theta_{x})\); \(\quad\) \(S_{v}(\theta_{v}),\,T_{v}(\theta_{v}),\,Q_{v}(\theta_{v})\)

L2HMC: Modified Leapfrog

  • Writing the action of the new leapfrog integrator as an operator \(\mathbf{L}_{\theta}\), parameterized by \(\theta\).
  • Applying this operator \(M\) times successively to \(\xi\):
\mathbf{L}_{\theta} = \mathbf{L}_{\theta}(x, v, d) = \left(x^{{\prime\prime}^{\times M}}, v^{{\prime\prime}^{\times M}}, d\right)
  • The "flip" operator \(\mathbf{F}\) reverses \(d\): \(\mathbf{F}\xi = (x, v, -d)\).
  • Write the complete dynamics step as:
\mathbf{FL}_{\theta} \xi = \xi^{\prime}

(trajectory length)

L2HMC: Accept/Reject

  • \(\mathcal{J}\) can be computed efficiently!
    • Only depends on: \(S_{x}, S_{v}, \zeta_{i}\).
  • This has the effect of deforming the energy landscape:
\mathcal{J}
  • Accept the proposed configuration, \(\xi^{\prime}\) with probability:

\(A(\xi^{\prime}|\xi) = \min{\left(1, \frac{p(\mathbf{FL}_{\theta}\xi)}{p(\xi)}\left|\frac{\partial\left[\mathbf{FL}\xi\right]}{\partial\xi^{T}}\right|\right)}\)

\(|\mathcal{J}| \neq 1\)

Unlike HMC,

L2HMC: Loss function

  • Choose a loss designed to reduce mixing (autocorrelation) time \(\tau_{\mathcal{O}}^{\mathrm{int}}\):
    • Idea: minimize the autocorrelation time by maximizing the "distance" traveled during integration.

 

\ell_{\lambda}\left(\xi, \xi^{\prime}, A(\xi^{\prime}|\xi)\right) = \frac{\lambda^{2}}{\delta(\xi, \xi^{\prime})A(\xi^{\prime}|\xi)} - \frac{\delta(\xi, \xi^{\prime})A(\xi^{\prime}|\xi)}{\lambda^{2}}
\underbrace{\hspace{38px}}

Encourages typical moves to be large

\overbrace{\hspace{38px}}

Penalizes sampler if unable to move effectively

scale parameter

"distance" between \(\xi, \xi^{\prime}\): \(\delta(\xi, \xi^{\prime}) = \|x - x^{\prime}\|^{2}_{2}\)

  • Note:

\(\delta \times A = \) "expected" distance

Accept prob.

Network Architecture

Build model,

initialize network

Run dynamics, Accept/Reject

Calculate         

Backpropagate

Finished

 training?

Save trained

model

Run inference

​on saved model

\ell_{\lambda}(\theta)

Train step

L2HMC

HMC

GMM: Autocorrelation

L2HMC: \(U(1)\) Lattice Gauge Theory

U_{\mu}(i) = e^{i \phi_{\mu}(i)} \in U(1)
-\pi < \phi_{\mu}(i) \leq \pi
\beta S = \beta \sum_{P}\left(1 - \cos(\phi_{P})\right)
  • Wilson action:
\phi_{P} \equiv \phi_{\mu\nu}(i)\\
= \phi_{\mu}(i) + \phi_{\nu}(i + \hat{\mu})- \phi_{\mu}(i+\hat{\nu}) - \phi_{\nu}(i)

where:

  • Link variables:

Sum of \(\phi\) around plaquette

\(U(1)\) Lattice Gauge Theory

Good sampling

Poor sampling

L2HMC

HMC

Thanks for listening!