Sam Foreman
May, 2021
Critical slowing down!
\(x^{\prime} = x + \delta\), where \(\delta \sim \mathcal{N}(0, \mathbb{1})\)
\(\big\{ x_{0}, x_{1}, x_{2}, \ldots, x_{m-1}, x_{m}, x_{m+1}, \ldots, x_{n-2}, x_{n-1}, x_{n} \big\}\)
1. Construct chain:
\(\big\{ x_{0}, x_{1}, x_{2}, \ldots, x_{m-1}, x_{m}, x_{m+1}, \ldots, x_{n-2}, x_{n-1}, x_{n} \big\}\)
2. Thermalize ("burn-in"):
\(\big\{ x_{0}, x_{1}, x_{2}, \ldots, x_{m-1}, x_{m}, x_{m+1}, \ldots, x_{n-2}, x_{n-1}, x_{n} \big\}\)
3. Drop correlated samples ("thinning"):
dropped configurations
Inefficient!
Introduce fictitious momentum:
\(v\sim\mathcal{N}(0, 1)\)
Target distribution:
\(p(x)\propto e^{-S(x)}\)
Joint target distribution:
The joint \((x, v)\) system obeys Hamilton's Equations:
lift to phase space
(trajectory)
2. Full-step \(x\)-update:
3. Half-step \(v\)-update:
1. Half-step \(v\)-update:
Stuck!
where \((s_{v}^{k}, q^{k}_{v}, t^{k}_{v})\), and \((s_{x}^{k}, q^{k}_{x}, t^{k}_{x})\), are parameterized by neural networks
(\(m_{t}\)\(\odot x\)) -independent
masks:
Momentum (\(v_{k}\)) scaling
Gradient \(\partial_{x}S(x_{k})\) scaling
Translation
(\(v\)-independent)
by passing it through the \(k^{\mathrm{th}}\) leapfrog layer.
masks:
Stack of fully-connected layers
\(A(\xi',\xi)\) = acceptance probability
\(A(\xi'|\xi)\cdot\delta(\xi',\xi)\) = avg. distance
Note:
\(\xi'\) = proposed state
\(\xi\) = initial state
HMC
L2HMC
construct trajectory
Compute loss + backprop
Metropolis-Hastings accept/reject
re-sample momentum + direction
(varied slowly)
e.g. \( \{0.1, 0.2, \ldots, 0.9, 1.0\}\)
(increasing)
continuous, differentiable
discrete, hard to work with
Leapfrog step
variation in the avg. plaquette
continuous topological charge
shifted energy
\(A(\xi^{\prime}|\xi) = \) "acceptance probability"
where:
leapfrog step
(MD trajectory)
Topological charge history
~ cost / step
continuum limit
Estimate of the Integrated autocorrelation time of \(\mathcal{Q}_{\mathbb{R}}\)
Jacobian:
\(\left|\frac{\partial v''_{k}}{\partial v_{k}}\right|=\exp\left(\frac{1}{2}\varepsilon^{k}_{v} s_{v}^{k}(\zeta_{v_{k}})\right)\), \(\left|\frac{\partial x''_{k}}{\partial x_{k}}\right|=\exp\left(\varepsilon^{k}_{v} s_{v}^{k}(\zeta_{v_{k}})\right)\)
*
*
Update complementary indices determined by and
Main idea
Jacobian:
\(\left|\frac{\partial v''_{k}}{\partial v_{k}}\right|=\exp\left(\frac{1}{2}\varepsilon^{k}_{v} s_{v}^{k}(\zeta_{v_{k}})\right)\), \(\left|\frac{\partial x''_{k}}{\partial x_{k}}\right|=\exp\left(\varepsilon^{k}_{v} s_{v}^{k}(\zeta_{v_{k}})\right)\)
*
*
Text
\(A(\xi',\xi)\) = acceptance probability
\(A(\xi'|\xi)\cdot\delta(\xi',\xi)\) = avg. distance
Note:
Encourages large moves
Penalizes small moves
HMC
L2HMC
where:
Generalized \(x\)-update:
Momentum (\(v_{k}\)) scaling
(\(v\)) independent
Gradient \(\partial_{x}S(x_{k})\) scaling
Translation
(\(m_{t}\)\(\odot x\)) independent
Neural Networks
Generalized \(v\)-update:
(masks)
(\(\alpha_{s}, \alpha_{q}\) are trainable parameters)
\(x\), \(v\) \(\in \mathbb{R}^{n}\)
\(s_{x}\),\(q_{x}\),\(t_{x}\) \(\in \mathbb{R}^{n}\)
4096
8192
1024
2048
512
\(4096 \sim 1.73\times\)
\(8192 \sim 2.19\times\)
\(1024 \sim 1.04\times\)
\(2048 \sim 1.29\times\)
\(512\sim 1\times\)
\(8192\sim \times\)
4096
1024
2048
512
Momentum scaling
Gradient scaling
Translation
momentum
position
(forward direction, \(d = +1\))
Momentum scaling
Gradient scaling
Translation
inputs
(trajectory length)
\(|\mathcal{J}| \neq 1\)
Unlike HMC,
Encourages typical moves to be large
Penalizes sampler if unable to move effectively
scale parameter
"distance" between \(\xi, \xi^{\prime}\): \(\delta(\xi, \xi^{\prime}) = \|x - x^{\prime}\|^{2}_{2}\)
\(\delta \times A = \) "expected" distance
where:
where:
Interested?
github.com/saforem2/l2hmc-qcd
02/10/2020
Build model,
initialize network
Run dynamics, Accept/Reject
Calculate
Backpropagate
Finished
training?
Save trained
model
Run inference
on saved model
Train step
Integrate \(H(x, v)\):
\(t \longrightarrow t + \varepsilon\)
Project onto target parameter space \(p(x, v) \longrightarrow p(x)\)
\(v \sim p(v)\)
Markov Chain Monte Carlo (MCMC)
if \(q(x^{\prime}|x) = q(x|x^{\prime})\)
correlated!
burn-in
Ideal for lattice QCD due to critical slowing down!
Determinant of the Jacobian, \(\|\mathcal{J}\|\)
(for HMC)
Momentum scaling
Gradient scaling
Translation
momentum
position