Understanding Generative Models

via Interactions

Claudia Merger, Alexandre Rene, Kirsten Fischer, Peter Bouss, Sandra Nestler, David Dahmen, Carsten Honerkamp, Moritz Helias and Sebastian Goldt

13.03.2026

Generative models learn data statistics

examples use cases:

image/video/audio/text generators
physical observables (replace costly scientific simulations)
foundation models for drug discovery,

Task: Given some data \( \mathcal{D} \) from an unknown distribution \( p \)

Generate \( x \sim p \)

Task is solved by learning \( \, p_{\theta} \approx p\)

e.g. with Likelihood \( \mathcal{L}\left(\mathcal{D}\right) =-\sum_{x \in \mathcal{D}} \ln p_{\theta}(x) \)

Understanding Generative models

Task: Given some data \( \mathcal{D} \) from an unknown distribution \( p \)

Generate \( x \sim p \)

Task is solved by learning \( \, p_{\theta} \approx p\)

Two questions:

What can we learn from \(p_{\theta} \) about data?
How close are \( p, \, p_{\theta} \) ?

\( p\)

\( \, p_{\theta} \)

Span model space with interactions.

Write interacting theory using polynomial action \( S_{\theta} (x) = \ln p_{\theta} (x)\)

\( S_{\theta} (x)= A^{(0)} + A^{(1)}_{i} x_i + A^{(2)}_{ij} x_i x_j +A^{(3)}_{ijk} x_i x_j x_k + \dots \)

Interactions are effective descriptions of complex systems

Merger, C., et. al. ‘Learning Interacting Theories from Data’. PRX, 2023

Write interacting theory using polynomial action \( S_{\theta} (x) = \ln p_{\theta} (x)\)

\( S_{\theta} (x)= A^{(0)} + A^{(1)}_{i} x_i + A^{(2)}_{ij} x_i x_j +A^{(3)}_{ijk} x_i x_j x_k + \dots \)

Example:

Interactions are effective descriptions of complex systems

Merger, Rene, et. al. ‘Learning Interacting Theories from Data’. PRX, 2023

Write interacting theory using polynomial action \( S_{\theta} (x) = \ln p_{\theta} (x)\)

\( S_{\theta} (x)= A^{(0)} + A^{(1)}_{i} x_i + A^{(2)}_{ij} x_i x_j +A^{(3)}_{ijk} x_i x_j x_k + \dots \)

\( A^{(k)} \)

Interactions are effective descriptions of complex systems

Why Interactions?

\( S_{\theta} (x)= A^{(0)} + A^{(1)}_{i} x_i + A^{(2)}_{ij} x_i x_j +A^{(3)}_{ijk} x_i x_j x_k + \dots \)

Why use interactions to study deep learning?

Observation: neural networks learn "easy" statistics first, then more complex statistics

\( \rightarrow \) see also: Ingrosso & Goldt, 2022; Refinetti et al., 2023; Belrose et al., 2024, ...

\( \rightarrow \) principled approach to studying learning of statistics from data, from easy to hard

Predict performance of diffusion models as a function of \( \# \text{training examples} \)

\( p\)

\( \, p_{\theta} \)

Merger, Goldt, 2025 arXiv.2505.24769.

Good performance: at least \( \# \text{training examples} \asymp d\)

Stronger decay in spectra \(\rightarrow \) better performance at fixed \(N\)

estimate

\( A^{(2)} \propto \frac{1}{\Sigma_{\text{emp.}} +\gamma \text{Id}} \neq \frac{1}{\Sigma_{\text{true}}}\)

of \(\Sigma_{\text{true}}\)

Predict performance of diffusion models as a function of \( \# \text{training examples} \)

\( p\)

\( \, p_{\theta} \)

Bardone, Merger, Goldt, 2026

on arXiv soon!

Plant one direction with higher order statistics:

diffusion models need \( \mathcal{O} \left(d^{k^*-1} \right) \) samples to find it

Understanding Generative models via Interactions

\( p\)

\( \, p_{\theta} \)

Using interactions, we can

map the inferred statistics to an interpretable form central to physics
predict the performance of generative models

Thanks to

Lorenzo Bardone

Alexandre Rene

Kirsten Fischer

Peter Bouss

Sandra Nestler

David Dahmen

Carsten Honerkamp

Moritz Helias

Sebastian Goldt