Ambient Diffusion Update

Aug 8, 2025

Adam Wei

Agenda

Effect of \(\sigma_{min}\)
Effect of buffer
Frequency Analysis
Iterative Ambient Diffusion

Algorithm Overview

Repeat:

Sample (O, A, \(\sigma_{min}\)) ~ \(\mathcal{D}\)
Choose noise level \(\sigma > \sigma_{min}\)
Optimize denoiser or ambient loss

\(\sigma=1\)

\(\sigma=0\)

\(\sigma=\sigma_{min}\)

Corrupt Data (\(\sigma_{min}\geq0\))

Clean Data (\(\sigma_{min}=0\))

Loss Function (for \(x_0\sim q_0\))

\(\mathbb E[\lVert h_\theta(x_t, t) + \frac{\sigma_{min}^2\sqrt{1-\sigma_{t}^2}}{\sigma_t^2-\sigma_{min}^2}x_{t} - \frac{\sigma_{t}^2\sqrt{1-\sigma_{min}^2}}{\sigma_t^2-\sigma_{min}^2} x_{t_{min}} \rVert_2^2]\)

Ambient Loss

Denoising Loss

\(x_0\)-prediction

\(\epsilon\)-prediction

(assumes access to \(x_0\))

(assumes access to \(x_{t_{min}}\))

\(\mathbb E[\lVert h_\theta(x_t, t) - x_0 \rVert_2^2]\)

\(\mathbb E[\lVert h_\theta(x_t, t) - \epsilon \rVert_2^2]\)

\(\mathbb E[\lVert h_\theta(x_t, t) - \frac{\sigma_t^2 (1-\sigma_{min}^2)}{(\sigma_t^2 - \sigma_{min}^2)\sqrt{1-\sigma_t^2}}x_t + \frac{\sigma_t \sqrt{1-\sigma_t^2}\sqrt{1-\sigma_{min}^2}}{\sigma_t^2 - \sigma_{min}^2}x_{t_{min}}\rVert_2^2]\)

How to choose \(\sigma_{min}\)

Granularity

Per dataset
Per datapoint

Choosing \(\sigma_{min}\)

Sweep \(\sigma_{min} \in [0,1]\)
Train a classifier

Previous results: per dataset and sweep \(\sigma_{min}\)

"Clean" Data

"Corrupt" Data

\(|\mathcal{D}_T|=50\)

\(|\mathcal{D}_S|=2000\)

Eval criteria: Success rate for planar pushing across 200 randomized trials

Datasets and Task

Sweep \(\sigma_{min}\) per dataset

Previous results: per dataset and sweep \(\sigma_{min}\)

Sweep \(\sigma_{min}\) per dataset

Previous results: per dataset and sweep \(\sigma_{min}\)

Choosing \(\sigma_{min}\): Classifier

\(\sigma_{min}^i = \inf\{\sigma\in[0,1]: c_\theta (x_\sigma, \sigma) > 0.5-\epsilon\}\)

\(p_0\)

\(q_0\)

\(\implies \sigma_{min}^i = \inf\{\sigma\in[0,1]: d_\mathrm{TV}(p_\sigma, q_\sigma) < \epsilon\}\)*

* assuming \(c_\theta\) is perfectly trained

Distribution of \(\sigma_{min}\) in \(\mathcal{D}_S\)

Results: Classifier \(\sigma_{min}\)

Sensitivity to \(\sigma_{min}\)

Different checkpoints will choose different \(\sigma_{min}\)

Epoch 400: average \(t_{min}^i\) is 18.2

Epoch 800: average \(t_{min}^i\) is 19.99

Stronger checkpoints\(\implies\) larger \(\sigma_{min}\) required to fool classifier \(\implies\) use corrupt data less

Sensitivity to \(\sigma_{min}\)

Performance is sensitive to classifier and \(\sigma_{min}\) choice!

Denoising Loss vs Ambient Loss

\(\sigma=1\)

\(\sigma=0\)

\(\mathbb E[\lVert h_\theta(x_t, t) - x_0 \rVert_2^2]\)

Corrupt Data: ambient loss

\( x_0 \sim p_0\)

\( y_0 \sim q_0\)

\(\sigma=\sigma_{min}\)

\(\mathbb E[\lVert h_\theta(y_t, t) + \frac{\sigma_{min}^2\sqrt{1-\sigma_{t}^2}}{\sigma_t^2-\sigma_{min}^2}y_{t} - \frac{\sigma_{t}^2\sqrt{1-\sigma_{min}^2}}{\sigma_t^2-\sigma_{min}^2} y_{t_{min}} \rVert_2^2]\)

Clean Data: denoising loss

Key insight: if \(\sigma \approx \sigma_{min}\), ambient loss has division by 0 !

Denoising Loss vs Ambient Loss

\(\sigma=1\)

\(\sigma=0\)

\(\mathbb E[\lVert h_\theta(x_t, t) - x_0 \rVert_2^2]\)

\( x_0 \sim p_0\)

\( y_0 \sim q_0\)

\(\sigma=\sigma_{min}\)

\(\sigma_{buffer}(\sigma_{min}, \sigma_t)\)

Clean Data: denoising loss

Key insight: Add buffer to stabilize training (avoid division by 0)

Corrupt Data: ambient loss

Denoising Loss vs Ambient Loss

Ambient Diffusion: Effect of Buffer

Denoising Loss vs Ambient Loss

Ambient Diffusion: Scaling \(|\mathcal{D}_S|\)

* denoising loss

Denoising Loss vs Ambient Loss

Ambient Diffusion: Scaling \(|\mathcal{D}_S|\)

Hypothesis: Ambient loss underperforms denoising loss for small \(\mathcal{D}_S\) but may scale better

\(|\mathcal{D}_S|=500\)

\(|\mathcal{D}_S|=8000\)

Denoising Loss vs Ambient Loss

Ambient Diffusion: Scaling \(|\mathcal{D}_S|\)

Hypothesis: Ambient loss underperforms denoising loss for small \(\mathcal{D}_S\) but may scale better

Ambient Loss

Denoising Loss

\(x_0\)-prediction

(assumes access to \(x_0\))

(assumes access to \(x_{t_{min}}\))

\(\mathbb E[\lVert h_\theta(x_t, t) - x_0 \rVert_2^2]\)

Bias since \(x_0\sim q_0\) and \(p_0\neq q_0\)
Larger \(\mathcal{D}_S \implies\) more samples with bias

lower bias since \(x_{\sigma_{min}}\sim q_{\sigma_{min}} \) and \(p_{\sigma_{min}} \approx q_{\sigma_{min}}\)
Worse sample complexity due to noisy MSE targets
But (potentially) better scaling due to reduced bias

Denoising Loss vs Ambient Loss

Frequency Analysis: Computer Vision

Denoising Loss vs Ambient Loss

Frequency Analysis: Computer Vision

Denoising Loss vs Ambient Loss

Frequency Analysis: Robotics

Questions:

Does robotics data follow a similar power law in frequency domain?
What does the corruption look like in the frequency domain*

* Ambient diffusion is most effective for high-frequency corruptions

Denoising Loss vs Ambient Loss

Frequency Analysis: Robotics

Maze: RRT

Maze: GCS

T-Pushing: GCS

T-Pushing: Human

Denoising Loss vs Ambient Loss

Frequency Analysis: Robotics

T-Pushing: Human

Maze: GCS

Denoising Loss vs Ambient Loss

Frequency Analysis: Corruption

Goal: Characterize the nature of the corruption

Higher-noise corruption is better, but not needed

For every clean sequence:

Identify the nearest state neighbors in the corrupt data
Compute PSD of the difference in actions

Average the PSDs of the ~10,000 closest state neighbors

Denoising Loss vs Ambient Loss

Frequency Analysis: Corruption

Denoising Loss vs Ambient Loss

Frequency Analysis: Corruption

Denoising Loss vs Ambient Loss

Next Directions

Giannis and I are thinking of some other exciting ideas in this direction...

... more on this once results are available! :D

2025/08/08: Pablo/Asu

By weiadam

2025/08/08: Pablo/Asu

Slides for my talk at the Amazon CoRo Symposium 2025. For more details, please see the paper: https://arxiv.org/abs/2503.22634

Ambient Diffusion Update

Agenda

Algorithm Overview

Loss Function (for \(x_0\sim q_0\))

How to choose \(\sigma_{min}\)

Datasets and Task

Sweep \(\sigma_{min}\) per dataset

Sweep \(\sigma_{min}\) per dataset

Choosing \(\sigma_{min}\): Classifier

Distribution of \(\sigma_{min}\) in \(\mathcal{D}_S\)

Results: Classifier \(\sigma_{min}\)

Sensitivity to \(\sigma_{min}\)

Sensitivity to \(\sigma_{min}\)

Denoising Loss vs Ambient Loss

Denoising Loss vs Ambient Loss

Denoising Loss vs Ambient Loss

Denoising Loss vs Ambient Loss

Denoising Loss vs Ambient Loss

Denoising Loss vs Ambient Loss

Ambient Diffusion: Effect of Buffer

Denoising Loss vs Ambient Loss

Ambient Diffusion: Scaling \(|\mathcal{D}_S|\)

Denoising Loss vs Ambient Loss

Ambient Diffusion: Scaling \(|\mathcal{D}_S|\)

Denoising Loss vs Ambient Loss

Ambient Diffusion: Scaling \(|\mathcal{D}_S|\)

Denoising Loss vs Ambient Loss

Frequency Analysis: Computer Vision

Denoising Loss vs Ambient Loss

Frequency Analysis: Computer Vision

Denoising Loss vs Ambient Loss

Frequency Analysis: Robotics

Denoising Loss vs Ambient Loss

Frequency Analysis: Robotics

Denoising Loss vs Ambient Loss

Frequency Analysis: Robotics

Denoising Loss vs Ambient Loss

Frequency Analysis: Corruption

Denoising Loss vs Ambient Loss

Frequency Analysis: Corruption

Denoising Loss vs Ambient Loss

Frequency Analysis: Corruption

Denoising Loss vs Ambient Loss

Next Directions

2025/08/08: Pablo/Asu

More from weiadam