Sam Foreman
2021-03-11
Metropolis-Hastings: Accept/Reject
import numpy as np
def metropolis_hastings(p, steps=1000):
x = 0. # initialize config
samples = np.zeros(steps)
for i in range(steps):
x_prime = x + np.random.randn() # proposed config
if np.random.rand() < p(x_prime) / p(x): # compute A(x'|x)
x = x_prime # accept proposed config
samples[i] = x # accumulate configs
return samples
As
,
dropped configurations
Inefficient!
Target distribution:
\(p(x)\propto e^{-S(x)}\)
Introduce fictitious momentum:
Joint target distribution, \(p(x, v)\)
\(p(x, v) = p(x)\cdot p(v) = e^{-S(x)}\cdot e^{-\frac{1}{2}v^{T}v} = e^{-\mathcal{H(x,v)}}\)
The joint \((x, v)\) system obeys Hamilton's Equations:
\(v\sim\mathcal{N}(0, 1)\)
\(\dot{x} = \frac{\partial\mathcal{H}}{\partial v}\)
\(\dot{v} = -\frac{\partial\mathcal{H}}{\partial x}\)
\(S(x)\) is the action
(potential energy)
Hamilton's Equations:
2. Full-step position update:
1. Half-step momentum update:
3. Half-step momentum update:
Cannot easily traverse low-density zones.
What do we want in a good sampler?
Energy levels selected randomly \(\longrightarrow\) slow mixing!
(especially for Lattice QCD)
momentum (\(v_{k}\)) scaling
Gradient \(\partial_{x}S(x_{k})\) scaling
Translation
Note:
split via \(m^t\)
(\(\alpha_{s}, \alpha_{q}\) are trainable parameters)
\(x\), \(v\) \(\in \mathbb{R}^{n}\)
\(s_{x}\),\(q_{x}\),\(t_{x}\) \(\in \mathbb{R}^{n}\)
where:
(varied slowly)
\(= \{0.1, 0.2, \ldots, 0.9, 1.0\}\)
(increasing)
(real-valued)
(integer-valued)
\(A(\xi^{\prime}|\xi) = \) "acceptance probability"
where:
leapfrog step
(MD trajectory)
Topological charge history
~ cost / step
continuum limit
Estimate of the Integrated autocorrelation time of \(\mathcal{Q}_{\mathbb{R}}\)
4096
8192
1024
2048
512
\(4096 \sim 1.73\times\)
\(8192 \sim 2.19\times\)
\(1024 \sim 1.04\times\)
\(2048 \sim 1.29\times\)
\(512\sim 1\times\)
\(8192\sim \times\)
4096
1024
2048
512
Momentum scaling
Gradient scaling
Translation
momentum
position
(forward direction, \(d = +1\))
Momentum scaling
Gradient scaling
Translation
inputs
(trajectory length)
\(|\mathcal{J}| \neq 1\)
Unlike HMC,
Encourages typical moves to be large
Penalizes sampler if unable to move effectively
scale parameter
"distance" between \(\xi, \xi^{\prime}\): \(\delta(\xi, \xi^{\prime}) = \|x - x^{\prime}\|^{2}_{2}\)
\(\delta \times A = \) "expected" distance
where:
where:
Interested?
github.com/saforem2/l2hmc-qcd
02/10/2020
Build model,
initialize network
Run dynamics, Accept/Reject
Calculate
Backpropagate
Finished
training?
Save trained
model
Run inference
on saved model
Train step
Integrate \(H(x, v)\):
\(t \longrightarrow t + \varepsilon\)
Project onto target parameter space \(p(x, v) \longrightarrow p(x)\)
\(v \sim p(v)\)
Markov Chain Monte Carlo (MCMC)
if \(q(x^{\prime}|x) = q(x|x^{\prime})\)
correlated!
burn-in
Ideal for lattice QCD due to critical slowing down!
Determinant of the Jacobian, \(\|\mathcal{J}\|\)
(for HMC)
Momentum scaling
Gradient scaling
Translation
momentum
position