Simulation-based inference
Boomers Quantified Uncertainty. We Simulate It
[Video Credit: N-body simulation Francisco Villaescusa-Navarro]
IAIFI Fellow
Carolina Cuesta-Lazaro

DESC
Why should I care?
Decision making
Decision making in science
Carolina Cuesta-Lazaro IAIFI/MIT - Simulation-Based Inference
Is the current Standard Model ruled out by data?
Mass density
Vacuum Energy Density
CMB
Supernovae

Observation
Ground truth
Prediction
Uncertainty
Is it safe to drive there?
Carolina Cuesta-Lazaro IAIFI/MIT - Simulation-Based Inference

Better data needs better models
Interpretable Simulators

Carolina Cuesta-Lazaro IAIFI/MIT - Simulation-Based Inference
Uncertainties are everywhere

Noise in features
+ correlations
Noise in finite data realization
Uncertain parameters
Limited model architecture
Imperfect optimization

Carolina Cuesta-Lazaro IAIFI/MIT - Simulation-Based Inference
Ensembling / Bayesian NNs
Forward Model
Observable
Dark matter
Dark energy
Inflation
Predict
Infer

Parameters
Carolina Cuesta-Lazaro IAIFI/MIT - Simulation-Based Inference
Inverse mapping

Fault line stress
Plate velocity

Likelihood
Posterior
Prior
Evidence
Markov Chain Monte Carlo MCMC
Hamiltonian Monte Carlo HMC
Variational Inference VI
Carolina Cuesta-Lazaro IAIFI/MIT - Simulation-Based Inference
If can evaluate posterior (up to normalization), but not sample
Intractable
Unknown likelihoods
Amortized inference
Scaling high-dimensional
Marginalization nuisance

Carolina Cuesta-Lazaro IAIFI/MIT - Simulation-Based Inference
["Polychord: nested sampling for cosmology" Handley et al]
["Fluctuation without dissipation: Microcanonical Langevin Monte Carlo" Robnik and Seljak]

The price of sampling
Higher Effective Sample Size (ESS) = less correlated samples
Number of Simulator Calls
Known likelihood
Differentiable simulators
The simulator samples the likelihood
Carolina Cuesta-Lazaro IAIFI/MIT - Simulation-Based Inference





z: All possible trajectories

Maximize the likelihood of the training samples
Model
Training Samples
Carolina Cuesta-Lazaro IAIFI/MIT - Simulation-Based Inference
Neural Likelihood Estimation NLE






NLE
No implicit prior
Not amortized
Goodness-of-fit
Scaling with dimensionality of x
Implicit marginalization
Neural Posterior Estimation NPE
Loss Approximate variational posterior, q, to true posterior, p

Image Credit: "Bayesian inference; How we are able to chase the Posterior" Ritchie Vink
Carolina Cuesta-Lazaro IAIFI/MIT - Simulation-Based Inference
KL Divergence
Need samples from true posterior
Run simulator
Carolina Cuesta-Lazaro IAIFI/MIT - Simulation-Based Inference
Minimize KL
Amortized Inference!
Carolina Cuesta-Lazaro IAIFI/MIT - Simulation-Based Inference
Run simulator
Neural Posterior Estimation NPE
Neural Compression


High-Dimensional
Low-Dimensional
Carolina Cuesta-Lazaro IAIFI/MIT - Simulation-Based Inference
s is sufficient iif
Neural Compression: MI
Maximise
Mutual Information
Carolina Cuesta-Lazaro IAIFI/MIT - Simulation-Based Inference
Need true posterior!
Carolina Cuesta-Lazaro IAIFI/MIT - Simulation-Based Inference
NLE
No implicit prior
Not amortized
Goodness-of-fit
Scaling with dimensionality of x
NPE
Amortized
Scales well to high dimensional x
Goodness-of-fit
Fixed prior
Implicit marginalization
Implicit marginalization
Do we actually need Density Estimation?
Just use binary classifiers!

Binary cross-entropy
Sample from simulator
Mix-up
Carolina Cuesta-Lazaro IAIFI/MIT - Simulation-Based Inference
Likelihood-to-evidence ratio
Likelihood-to-evidence ratio
Carolina Cuesta-Lazaro IAIFI/MIT - Simulation-Based Inference
NLE
No implicit prior
Not amortized
Goodness-of-fit
Scaling with dimensionality of x
NPE
NRE
Amortized
Scales well to high dimensional x
Goodness-of-fit
Fixed prior
Implicit marginalization
No need variational distribution
No implicit prior
Implicit marginalization
Approximately normalised
Not amortized
Implicit marginalization
Density Estimation 101
Maximize the likelihood of the training samples
Model


Training Samples









Carolina Cuesta-Lazaro IAIFI/MIT - Simulation-Based Inference
Trained Model

Evaluate probabilities


Low Probability
High Probability

Generate Novel Samples


Simulator
Simulator
Carolina Cuesta-Lazaro IAIFI/MIT - Simulation-Based Inference
Normalizing flows
[Image Credit: "Understanding Deep Learning" Simon J.D. Prince]
Carolina Cuesta-Lazaro IAIFI/MIT - Simulation-Based Inference
Bijective
Sample
Evaluate probabilities
Probability mass conserved locally
Image Credit: "Understanding Deep Learning" Simon J.D. Prince
Carolina Cuesta-Lazaro IAIFI/MIT - Simulation-Based Inference
Carolina Cuesta-Lazaro IAIFI/MIT - Simulation-Based Inference
Masked Autoregressive Flows
Neural Network


Sample
Evaluate probabilities
NLE
No implicit prior
Not amortized
Goodness-of-fit
Scaling with dimensionality of x
NPE
NRE
Amortized
Scales well to high dimensional x
Goodness-of-fit
Fixed prior
Implicit marginalization
No need variational distribution
No implicit prior
Implicit marginalization
Approximately normalised
Not amortized
Implicit marginalization
How good is your posterior?
Carolina Cuesta-Lazaro IAIFI/MIT - Simulation-Based Inference
Test log likelihood
["Benchmarking simulation-based inference"
Lueckmann et al
arXiv:2101.04653]
Posterior predictive checks

Observed
Re-simulated posterior samples
Carolina Cuesta-Lazaro IAIFI/MIT - Simulation-Based Inference
Classifier 2 Sample Test (C2ST)



Real or Fake?
Benchmarking SBI
Carolina Cuesta-Lazaro IAIFI/MIT - Simulation-Based Inference
["Benchmarking simulation-based inference"
Lueckmann et al
arXiv:2101.04653]

Classifier 2 Sample Test (C2ST)
Carolina Cuesta-Lazaro IAIFI/MIT - Simulation-Based Inference
["A Trust Crisis In Simulation-Based Inference? Your Posterior Approximations Can Be Unfaithful" Hermans et al
arXiv:2110.06581]

Much better than overconfident!
Coverage: assessing uncertainties
Carolina Cuesta-Lazaro IAIFI/MIT - Simulation-Based Inference
["A Trust Crisis In Simulation-Based Inference? Your Posterior Approximations Can Be Unfaithful" Hermans et al
arXiv:2110.06581]
Credible region (CR)
Not unique
High Posterior Density region (HPD)
Smallest "volume"

True value in CR with
probability

Empirical Coverage Probability (ECP)
Carolina Cuesta-Lazaro IAIFI/MIT - Simulation-Based Inference

["Investigating the Impact of Model Misspecification in Neural Simulation-based Inference" Cannon et al arXiv:2209.01845 ]
Underconfident
Overconfident
Calibrated doesn't mean informative!
Always look at information gain too
Carolina Cuesta-Lazaro IAIFI/MIT - Simulation-Based Inference
Carolina Cuesta-Lazaro IAIFI/MIT - Simulation-Based Inference
["A Trust Crisis In Simulation-Based Inference? Your Posterior Approximations Can Be Unfaithful" Hermans et al
arXiv:2110.06581]

["Calibrating Neural Simulation-Based Inference with Differentiable Coverage Probability" Falkiewicz et al
arXiv:2310.13402]
["A Trust Crisis In Simulation-Based Inference? Your Posterior Approximations Can Be Unfaithful" Hermans et al
arXiv:2110.06581]
Carolina Cuesta-Lazaro IAIFI/MIT - Simulation-Based Inference










Model mispecification
Carolina Cuesta-Lazaro IAIFI/MIT - Simulation-Based Inference

["Investigating the Impact of Model Misspecification in Neural Simulation-based Inference" Cannon et al arXiv:2209.01845]
More misspecified
SBI Resources
Carolina Cuesta-Lazaro IAIFI/MIT - Simulation-Based Inference
"The frontier of simulation-based inference" Kyle Cranmer, Johann Brehmer, and Gilles Louppe

Github repos
Review

cuestalz@mit.edu

Book
"Probabilistic Machine Learning: Advanced Topics" Kevin P. Murphey

DESC-SBI
By carol cuesta
DESC-SBI
- 124