University of Oslo — 7 October 2022
Jeriek Van den Abeele
The Standard Model is incomplete.
[CERN/C. David]
Global fits address the need for a consistent comparison of BSM theories to all relevant experimental data
Challenge:
Scanning increasingly high-dimensional parameter spaces with varying phenomenology
Exploration of a combined likelihood function:
\(\mathcal{L} = \mathcal{L}_\mathsf{collider} \times \mathcal{L}_\mathsf{Higgs} \times \mathcal{L}_\mathsf{DM} \times \mathcal{L}_\mathsf{EWPO} \times \mathcal{L}_\mathsf{flavour} \times \ldots\)
Supersymmetry turns out hard to find ...
High-dimensional search spaces are non-intuitive
Most of the volume lies in the extremities!
Adaptive sampling techniques are essential for search space exploration: differential evolution, nested sampling, genetic algorithms, ...
In a 7-dimensional parameter space, you have only covered 47.8% of the space.
In a 19-dimensional parameter space, only 13.5%. And a grid with just 2 points per dimension takes 524,288 evaluations.
Imagine scanning the central 90% of each parameter range really well, at great cost.
90%
based on work with A. Buckley, A. Kvellestad, A. Raklev, P. Scott, J. V. Sparre, and I. A. Vazquez-Holm
Global fits need quick, but sufficiently accurate theory predictions
BSM scans today easily require \(\sim 10^7\) samples or more.
Higher-order BSM production cross-sections and theoretical uncertainties make a significant difference!
[GAMBIT, 1705.07919]
CMSSM
[hep-ph/9610490]
Existing higher-order evaluation tools are insufficient for large MSSM scans
$$ pp\to\tilde g \tilde g,\ \tilde g \tilde q_i,\ \tilde q_i \tilde q_j, $$
$$\tilde q_i \tilde q_j^{*},\ \tilde b_i \tilde b_i^{*},\ \tilde t_i \tilde t_i^{*}$$
at \(\mathsf{\sqrt{s}=13} \) TeV
xsec 1.0 performs Gaussian process regression for all strong SUSY cross-sections in the MSSM-24 for the LHC
A. Buckley, A. Kvellestad, A. Raklev, P. Scott, J. V. Sparre, JVDA, I. A. Vazquez-Holm
prior distribution over all functions
with the estimated smoothness
prior distribution over all functions
with the estimated smoothness
posterior distribution over functions
with updated \(m(\vec x)\)
data
target function
target function
[Liu+, 1806.00720]
For standard GP regression, training scales as \(\mathcal{O}(n^3)\), prediction as \(\mathcal{O}(n^2)\).
Divide-and-conquer approach for dealing with large datasets:
The exact weighting procedure is important, to ensure
"Generalized Robust Bayesian Committee Machine"
[Mohammadi+, 1602.00853]
Numerical errors may arise in the inversion of the covariance matrix, leading to negative predictive variances.
Fast estimate of SUSY (strong) production cross- sections at NLO, and uncertainties from
Goal
$$ pp\to\tilde g \tilde g,\ \tilde g \tilde q_i,\ \tilde q_i \tilde q_j, $$
$$\tilde q_i \tilde q_j^{*},\ \tilde b_i \tilde b_i^{*},\ \tilde t_i \tilde t_i^{*}$$
Interface
Method
Pre-trained, distributed Gaussian processes
Python tool with command-line interface
Processes
at \(\mathsf{\sqrt{s}=13}\) TeV
based on work with J. Heisig, J. Kersten and I. Strümke
Now, for an MSSM NLSP, thermal freeze-out (not \(T_R\)!) determines the abundance
\(\Omega^\mathsf{th}_\mathsf{NLSP}\) is controlled by MSSM parameters; if it is low, the BBN impact is minimal!
Given R-parity, the LSP is stable and a dark matter candidate.
In a gravitino LSP scenario:
Parameter region where a neutralino NLSP dominantly annihilates via resonant heavy Higgs bosons: \[2m_{\tilde \chi_{1}^{0}} \approx m_{H^0/A^0}\]
Construct a likelihood \(\mathcal{L}_\mathsf{scan} = \mathcal{L}_\mathsf{relic\ density} \times \mathcal{L}_\mathsf{BBN} \times \mathcal{L}_\mathsf{collider} \times \mathcal{L}^\mathsf{fake}_\mathsf{T_R}\).
Nested sampling with MultiNest to examine region with highest \(\mathcal{L}_\mathsf{scan}\):
based on work with the GAMBIT Collaboration
EWMSSM: MSSM with only electroweakinos (\(\tilde{\chi}_i^0, \tilde{\chi}_i^\pm\)) not decoupled
GEWMSSM: EWMSSM + nearly massless gravitino LSP
Profile likelihood ratio shows preference for higgsinos near 200 GeV
(tiny excesses in searches for MET+leptons/jets)
Profile likelihood ratio, with likelihood capped at SM expectation (no signal)
based on work with J. Verhellen
Developing new drugs is an expensive process, typically taking 10 - 15 years.
Simple rules of thumb only provide little guidance in small-molecule drug design.
AI techniques, leveraging increased computational power and data availability, promise to speed it up.
Graph-based Elite Patch Illumination (GB-EPI) is a new illumination algorithm, based on MAP-ELITES from soft robot design.
Frequent stagnation!
Explicitly enforcing diversity in chosen feature space!
GB-EPI illuminates search spaces: it reveals how interesting features affect performance, and finds optima in each region
Benchmarks show that the quality-diversity approach boosts speed and success rate
Thank you!
Sometimes, a curious problem arises: negative predictive variances!
It is due to numerical errors when computing the inverse of the covariance matrix \(K\). When \(K\) contains many training points, there is a good chance that some of them are similar:
Nearly equal columns make \(K\) ill-conditioned. One or more eigenvalues \(\lambda_i\) are close to zero and \(K\) can no longer be inverted reliably. The number of significant digits lost is roughly the \(\log_{10}\) of the condition number
This becomes problematic when \(\kappa \gtrsim 10^8 \). In the worst-case scenario,
signal-to-noise ratio
number of points
Squared Exponential kernel
Matérn-3/2 kernel
GPs allow us to use probabilistic inference to learn a function from data, in an interpretable, analytical, yet non-parametric Bayesian framework.
A GP model is fully specified once the mean function, the kernel and its hyperparameters are chosen.
The probabilistic interpretation only holds under the assumption that the chosen kernel accurately describes the true correlation structure.
The choice of kernel allows for great flexibility. But once chosen, it fixes the type of functions likely under the GP prior and determines the kind of structure captured by the model, e.g., periodicity and differentiability.
The choice of kernel allows for great flexibility. But once chosen, it fixes the type of functions likely under the GP prior and determines the kind of structure captured by the model, e.g., periodicity and differentiability.
The choice of kernel allows for great flexibility. But once chosen, it fixes the type of functions likely under the GP prior and determines the kind of structure captured by the model, e.g., periodicity and differentiability.
For our multi-dimensional case of cross-section regression, we get good results by multiplying Matérn (\(\nu = 3/2\)) kernels over the different mass dimensions:
The different lengthscale parameters \(l_d\) lead to automatic relevance determination for each feature: short-range correlations for important features over which the target function varies strongly.
This is an anisotropic, stationary kernel. It allows for functions that are less smooth than with the standard squared-exponential kernel.
Short lengthscale, small noise
Long lengthscale, large noise
Underfitting, almost linear
Overfitting of fluctuations,
can lead to large uncertainties!
Typically, kernel hyperparameters are estimated by maximising the (log) marginal likelihood \(p( \vec y\ |\ \vec X, \vec \theta) \), aka the empirical Bayes method.
Alternative: MCMC integration over a range of \(\vec \theta\).
Gradient-based optimisation can get stuck in local optima and plateaus. Multiple initialisations can help, or global optimisation methods like differential evolution.
Global optimum: somewhere in between
Typically, kernel hyperparameters are estimated by maximising the (log) marginal likelihood \(p( \vec y\ |\ \vec X, \vec \theta) \), aka the empirical Bayes method.
Alternative: MCMC integration over a range of \(\vec \theta\).
Gradient-based optimisation can get stuck in local optima and plateaus. Multiple initialisations can help, or global optimisation methods like differential evolution.
The standard approach systematically underestimates prediction errors.
After accounting for the additional uncertainty from learning the hyper-parameters, the prediction error increases when far from training points.
[Wågberg+, 1606.03865]
Other tricks to improve the numerical stability of training:
Generating data
Random sampling
SUSY spectrum
Cross-sections
Optimise kernel hyperparameters
Training GPs
GP predictions
Input parameters
Linear algebra
Cross-section
estimates
Compute covariances between training points
Generating data
Random sampling
SUSY spectrum
Cross-sections
Optimise kernel hyperparameters
Training GPs
GP predictions
Input parameters
Linear algebra
Cross-section
estimates
XSEC
Compute covariances between training points
Training scales as \(\mathcal{O}(n^3)\), prediction as \(\mathcal{O}(n^2)\)
Random sampling with different priors, directly in mass space
Evaluation speed
Sample coverage
Need to cover a large parameter space
Distributed Gaussian processes
pip install xsec
xsec-download-gprocs --process_type gg
# Set directory and cache choices
xsec.init(data_dir="gprocs")
# Set center-of-mass energy (in GeV)
xsec.set_energy(13000)
# Load GP models for the specified process(es)
processes = [(1000021, 1000021)]
xsec.load_processes(processes)
# Enter dictionary with parameter values
xsec.set_parameters(
{
"m1000021": 1000,
"m1000001": 500,
"m1000002": 500,
"m1000003": 500,
"m1000004": 500,
"m1000005": 500,
"m1000006": 500,
"m2000001": 500,
"m2000002": 500,
"m2000003": 500,
"m2000004": 500,
"m2000005": 500,
"m2000006": 500,
"sbotmix11": 0,
"stopmix11": 0,
"mean": 500,
}
)
# Evaluate the cross-section with the given input parameters
xsec.eval_xsection()
# Finalise the evaluation procedure
xsec.finalise()
Regression problem, with 'measurement' noise:
\(y=f(\vec x) + \varepsilon, \ \varepsilon\sim \mathcal{N}(0,\sigma_n^2) \quad \rightarrow \quad \) infer \(f\), given data \(\mathcal{D} = \{\vec X, \vec y\}\)
Assume covariance structure expressed by a kernel function, like
Consider the data as a sample from a multivariate Gaussian distribution
\([\vec x_1, \vec x_2, \ldots]\)
\([y_1, y_2, \ldots]\)
signal kernel
white-noise kernel
Regression problem, with 'measurement' noise:
\(y=f(\vec x) + \varepsilon, \ \varepsilon\sim \mathcal{N}(0,\sigma_n^2) \quad \rightarrow \quad \) infer \(f\), given data \(\mathcal{D} = \{\vec X, \vec y\}\)
Training: optimise kernel hyperparameters by maximising the marginal likelihood
Posterior predictive distribution at a new point \(\vec x_*\) :
with
Implicit integration over points not in \(\vec X\)
[
prior over functions
The covariance matrix controls smoothness.
Assume it is given by a kernel function, like
posterior over functions
Bayesian approach to estimate \( y_* = f(x_*) \) :
Consider the data as a sample from a multivariate Gaussian distribution.
data
mean
covariance
[Beenakker+, hep-ph/9610490]
To succeed, Big Bang nucleosynthesis requires \(\frac{n_B-n_{\bar B}}{n_\gamma}\sim 10^{-9}\).
The Sakharov conditions for generating a baryon asymmetry dynamically:
Not satisfied in the Standard Model.
Baryogenesis via thermal leptogenesis provides a minimal realisation, only requiring heavy right-handed neutrinos \(N_i\) (\(\rightarrow m_\nu\) via see-saw mechanism):
out-of-eq. CP-violating \(N\) decays at \(T\sim m_N\) cause lepton asymmetry
baryon asymmetry
SM sphalerons
Given R-parity, the LSP is stable and a dark matter candidate.
In a neutralino LSP scenario with \(m_{\tilde G}\sim m_\mathsf{SUSY}\):
Due to \(M_\mathsf{Pl}\)-suppressed couplings, the gravitino easily becomes long-lived: \[\tau_{\tilde G} \sim 10^7~\mathsf{s} \left(\frac{100~\mathsf{GeV}}{m_{\tilde G}} \right)^3\]
Overabundant, delayed gravitino decays disrupt BBN, excluding \(T_R \gtrsim 10^5\) GeV!
So ... why not try a gravitino LSP (with neutralino NLSP)?
The gravitino relic density should match the observed \[\Omega_\mathsf{DM} h^2 = 0.1199\pm 0.0022\]
No thermal equilibrium for gravitinos in the early universe, due to superweak couplings (unless very light, but then no longer DM candidate due to Lyman-\(\alpha\))
So no standard mechanism to lower the gravitino abundance: instead, gradual build-up!
from processes like \((g+g\rightarrow) g \rightarrow \tilde g + \tilde G \)
The gravitino relic density should match the observed \[\Omega_\mathsf{DM} h^2 = 0.1199\pm 0.0022\]
No thermal equilibrium for gravitinos in the early universe, due to superweak couplings (unless very light, but then no longer DM candidate due to Lyman-\(\alpha\))
So no standard mechanism to lower the gravitino abundance: instead, gradual build-up!
\(\tilde{\chi}_1^0 \to \tilde{G} + \gamma \)
\(\tilde{\chi}_1^0 \to \tilde{G} + Z \)
\(\tilde{\chi}_1^0 \to \tilde{G} + \gamma^* \to \tilde{G} + f \bar{f} \qquad \mathsf{for } f = u,d,s,c,b,t, e, \mu, \tau\)
\(\tilde{\chi}_1^0 \to \tilde{G} + Z^{*} \to \tilde{G} + f \bar{f} \qquad \mathsf{for } f = u,d,s,c,b,t, e, \mu, \tau\)
\(\tilde{\chi}_1^0 \to \tilde{G} + (\gamma/Z)^{*} \to \tilde{G} + f \bar{f} \qquad \mathsf{for } f = u,d,s,c,b,t, e, \mu, \tau\)
\(\tilde{\chi}_1^0 \to \tilde{G} + h^0 \to \tilde{G} + XY \qquad \mathsf{for } XY = \mu^+\mu^-,\tau^+\tau^-, c\bar c, b \bar b, g g, \gamma \gamma, Z \gamma, ZZ, W^+ W^-\)
Relevant decay channels:
Crucially, the NLSP lifetime behaves as \(\tau_{\tilde \chi} \propto M_P^2 m_{\tilde G}^2/m_{\tilde \chi}^5 \) !
Lifetime/abundance limits for a generic particle decaying into \[u\bar u, b\bar b, t\bar t, gg, e^+e^-, \tau^+\tau^-, \gamma\gamma, W^+W^-\] and thus injecting energy into the primordial plasma
arXiv:1709.01211
p/n conversion
hadrodissociation
photodissociation
Last step due to computational expense, split into 5 components: