Claudia Merger, Alexandre Rene, Kirsten Fischer, Peter Bouss, Sandra Nestler, David Dahmen, Carsten Honerkamp, Moritz Helias and Sebastian Goldt
13.03.2026
examples use cases:
Task: Given some data \( \mathcal{D} \) from an unknown distribution \( p \)
Generate \( x \sim p \)
Task is solved by learning \( \, p_{\theta} \approx p\)
e.g. with Likelihood \( \mathcal{L}\left(\mathcal{D}\right) =-\sum_{x \in \mathcal{D}} \ln p_{\theta}(x) \)
Task: Given some data \( \mathcal{D} \) from an unknown distribution \( p \)
Generate \( x \sim p \)
Task is solved by learning \( \, p_{\theta} \approx p\)
Two questions:
\( p\)
\( \, p_{\theta} \)
Span model space with interactions.
?
?
Write interacting theory using polynomial action \( S_{\theta} (x) = \ln p_{\theta} (x)\)
\( S_{\theta} (x)= A^{(0)} + A^{(1)}_{i} x_i + A^{(2)}_{ij} x_i x_j +A^{(3)}_{ijk} x_i x_j x_k + \dots \)
Merger, C., et. al. ‘Learning Interacting Theories from Data’. PRX, 2023
Write interacting theory using polynomial action \( S_{\theta} (x) = \ln p_{\theta} (x)\)
\( S_{\theta} (x)= A^{(0)} + A^{(1)}_{i} x_i + A^{(2)}_{ij} x_i x_j +A^{(3)}_{ijk} x_i x_j x_k + \dots \)
Example:
Merger, Rene, et. al. ‘Learning Interacting Theories from Data’. PRX, 2023
Write interacting theory using polynomial action \( S_{\theta} (x) = \ln p_{\theta} (x)\)
\( S_{\theta} (x)= A^{(0)} + A^{(1)}_{i} x_i + A^{(2)}_{ij} x_i x_j +A^{(3)}_{ijk} x_i x_j x_k + \dots \)
\( A^{(k)} \)
Why Interactions?
\( S_{\theta} (x)= A^{(0)} + A^{(1)}_{i} x_i + A^{(2)}_{ij} x_i x_j +A^{(3)}_{ijk} x_i x_j x_k + \dots \)
Observation: neural networks learn "easy" statistics first, then more complex statistics
\( \rightarrow \) see also: Ingrosso & Goldt, 2022; Refinetti et al., 2023; Belrose et al., 2024, ...
\( \rightarrow \) principled approach to studying learning of statistics from data, from easy to hard
\( p\)
\( \, p_{\theta} \)
Merger, Goldt, 2025 arXiv.2505.24769.
Good performance: at least \( \# \text{training examples} \asymp d\)
Stronger decay in spectra \(\rightarrow \) better performance at fixed \(N\)
estimate
\( A^{(2)} \propto \frac{1}{\Sigma_{\text{emp.}} +\gamma \text{Id}} \neq \frac{1}{\Sigma_{\text{true}}}\)
of \(\Sigma_{\text{true}}\)
\( p\)
\( \, p_{\theta} \)
Bardone, Merger, Goldt, 2026
on arXiv soon!
Plant one direction with higher order statistics:
diffusion models need \( \mathcal{O} \left(d^{k^*-1} \right) \) samples to find it
\( p\)
\( \, p_{\theta} \)
Using interactions, we can
Lorenzo Bardone
Alexandre Rene
Kirsten Fischer
Peter Bouss
Sandra Nestler
David Dahmen
Carsten Honerkamp
Moritz Helias
Sebastian Goldt