Accounting for replication noise in model selection
Alexandre René
rene@netsci.rwth-aachen.de
SSC annual meeting • Session on Noise in Neural Systems
1 Jun 2026 • Hamilton
1,000 neurons of mouse brain © Allen institute
Fully mechanistic model
aka “bottom-up”
Effective model
Physics-inspired neural network (PINN)
Neural-network w/ interpretable dimensions
Black box
neural network
Interpretability
Flexibility of construction
Data requirements
Effective model
Neural-network w/ interpretable dimensions
(∞-pops)
(finite-pops)
pop activity
input
internal state
model
(René et al,, Neural Computations 2020)
\(i\) :
sample index
more robust inference, better generalization
(finite-pops)
(René et al,, Neural Computations 2020)
more robust inference, better generalization
Highly nonlinear model ⇒ multitude of solutions
(Bouss et al., PRX Life 2026)
then we can use PCA to learn the data distribution
(Bouss et al., PRX Life 2026)
then we can use PCA to learn the data distribution
(Bouss et al., PRX Life 2026)
log likelihood given by INN
Reconstruction accuracy keeping only \(l\) dimensions
→ lower dimensions show up in more terms
→ encourage low dimensions to be efficienc
(Bouss et al., PRX Life 2026)
States
Latent component strongly correlated w/ state
Again, INN is a highly nonlinear model
⇒ multitude of solutions
Anecdotal
observations
Conceive theory
Conceive experiment
Accumulate data
Compare
Make prediction
New experiment?
Assumptions
Symmetry
Conservation
Exchangeability
Validate/falsify
Select the model which is “best” on replications.
Scientific wisdom:
Machine learning wisdom:
“best” → lowest (empirical) risk
Prinz et al., Nat Neurosci (2004)
René, Pyloric simulator, PyPI (2025)
Fitting parameters
→ Distinct local solutions
Different equations
Different parameters
Same equations
Different parameters
Rayleigh-Jeans
Planck
Standard statistical criteria
EMD criterion
Prinz et al., Nat Neurosci (2004)
René, Pyloric simulator, PyPI (2025)
Different equations
Different parameters
Same equations
Different parameters
Rayleigh-Jeans
Planck
selection
criterion
We will use risk: \(\mathbb{E}_{\mathcal{M}_{\mathrm{true}}}[Q]\) to rank models
Some loss
\(θ_A\) subsumed into \(\mathcal{M}_A\)
Not all comparisons should be conclusive
Result is not consistent across replications
Intuition: More predictive accuracy ⇒ More reliable comparison
Why? If we know a source of variable, we can:
EMD assumption: Model discrepancies are due to unknown variability
Unknown sources of variability may change across experiments
\(R\): Risk
(lower is better)
Are these differences in risk all meaningful?
Data
Pointwise loss
(Empirical) risk
\((x_i, y_i) \sim \mathcal{D}_{\mathrm{true}} \)
\(Q(x_i, y_i \mid \mathcal{M}_A) \to \mathbb{R}\)
\(\mathbb{E}\bigl[Q(x_i, y_i \mid \mathcal{M}_A) \bigr] \approx \frac{1}{L} \;\sum\limits_{\mathclap{\qquad(x_i, y_i) \sim \mathcal{D}_{\mathrm{true}}}}\;\; Q(x_i, y_i \mid \mathcal{M}_A) \)
NB: \(θ\) subsumed into \(\mathcal{M}_a\)
We assume to have
\(R\): Risk
(lower is better)
Are these differences in risk all meaningful?
Data
Pointwise loss
(Empirical) risk
\((x_i, y_i) \sim \mathcal{D}_{\mathrm{true}} \)
\(Q(x_i, y_i \mid \mathcal{M}_A) \to \mathbb{R}\)
\(\mathbb{E}\bigl[Q(x_i, y_i \mid \mathcal{M}_A) \bigr] \approx \frac{1}{L} \;\sum\limits_{\mathclap{\qquad(x_i, y_i) \sim \mathcal{D}_{\mathrm{true}}}}\;\; Q(x_i, y_i \mid \mathcal{M}_A) \)
NB: \(θ\) subsumed into \(\mathcal{M}_a\)
We assume to have
Discrepancy
For purposes of calculating risk, we can reduce any model to \(q(Φ)\) without loss of information
tldr: Use Fubini’s theorem to rewrite risk integral
For purposes of calculating risk, we can reduce any model to \(q(Φ)\) without loss of information
EMD assumption (reframed): Candidate models represent that part of the experiment which we understand and control across replications
We can estimate \(R_A\) in two different ways:
Mixed \(q_A^*\)
Synth \(\tilde{q}_A\)
Repeat for each \(\mathcal{M}\)
Any process \(\mathcal{Q}\) should be
There is no way to coax a Wiener process to yield what we need
Variance must not depend on \(Φ\), only on \(δ^{\mathrm{EMD}}(Φ)\)
Instead of accumulating increments left-to-right, we successively refine the interval
We draw increment pairs, under the constraint
\(Δq_{ΔΦ}(Φ) \stackrel{!}{=} Δq_{ΔΦ/2}(Φ) + Δq_{ΔΦ/2}(Φ+ΔΦ/2)\)
We need a compositional distribution
Mateu-Figueras et al., Distributions on the Simplex Revisited, 2021
The simplest 2-D compositional distributon is the beta distribution
We draw increment pairs, under the constraint
\(Δq_{ΔΦ}(Φ) \stackrel{!}{=} Δq_{ΔΦ/2}(Φ) + Δq_{ΔΦ/2}(Φ+ΔΦ/2)\)
Beta
Compositional form
By construction
Determine \(α\) and \(β\)
We draw increment pairs, under the constraint
\(Δq_{ΔΦ}(Φ) \stackrel{!}{=} Δq_{ΔΦ/2}(Φ) + Δq_{ΔΦ/2}(Φ+ΔΦ/2)\)
Beta
Because of the constraint, mean and variance are not natural statistics for compositional distributions
Mateu-Figueras et al., Distributions on the Simplex Revisited, 2021
Instead it is better to use the center and metric variance
Two equations ⇒ Solve for \(α\) and \(β\)
Better noise produces better models
→
There is a cost to over-simplifying noise, eg. w/ least squares
Repeat for each \(\mathcal{M}\)
Calibration
Repeat for each \(\mathcal{M}\)
Calibration
All of this can be automated
emdcmp on PyPI
from emdcmp import Bemd, make_empirical_risk, draw_R_samples
synth_ppfA = make_empirical_risk(lossA(modelA.generate(Lsynth)))
synth_ppfB = make_empirical_risk(lossB(modelB.generate(Lsynth)))
mixed_ppfA = make_empirical_risk(lossA(data))
mixed_ppfB = make_empirical_risk(lossB(data))
Bemd(mixed_ppfA, mixed_ppfB, synth_ppfA, synth_ppfB, c=c)
Chair of Computational Network Science
(Prof. Michael Schaub)
netsci.rwth-aachen.de
Alexandre René
rene@netsci.rwth-aachen.de
www.arene.ca
Learning effective models
Cited papers
René, Longtin, Macke, Inference of a Mesoscopic Population Model from Population Spike Trains, Neural Computation (2020)
Learning invertible neural network
René, Longtin, Macke, Characterizing Neural Manifolds' Properties and Curvatures using Normalizing Flows, PRX Life (2026)
Epistemically-robust model selection
René, Longtin, Selecting fitted models under epistemic uncertainty using a stochastic process on quantile functions, Nature Communications (2025)
Multiple LP candidates with similar responses
Prinz et al., Nat Neurosci (2004)
René, pyloric simulator, PyPI (2025)
8D Parameter sweep
…
René et al., Neural Comp (2020)
Back-propagation through time
Dataset size
“Strength” of evidence
vol(posterior)
vol(posterior)
Use the fact that \(B^\mathrm{EMD}_{AB}\) are true probabilities:
Use the fact that \(B^\mathrm{EMD}_{AB}\) are true probabilities:
(white region)
(true)
(theory)
At a large scale, what kinds of variations do we want to account for?
High-level
How do we define/quantify these variations and the selection objective?
Specific
Higher-level assessment.
These follow from the choice of paradigm.
Functional
Bayesian information criterion
aka model evidence
minimum description length
Akaike information criterion
expected log pointwise predictive density
Ignoring
model
vs
discrete params
vs
continuous params
See esp. “Holes in Bayesian Statistics”, Gelman, Yao, J. Phys. G (2020)
prior over models
prior over models
posterior over params
vol(posterior)
vol(posterior)
vol(posterior)
vol(posterior)
vol(posterior)
vol(posterior)
Dataset size
“Strength” of evidence
Would you confidently select the Planck model based on these data?
Why not?
And yet…
Statistical criteria are descriptive
They consider only the data we have today, not those we will collect tomorrow