Aug 14, 2025
Adam Wei
Repeat:
\(\sigma=1\)
\(\sigma=0\)
\(\sigma=\sigma_{min}\)
Corrupt Data (\(\sigma_{min}\geq0\))
Clean Data (\(\sigma_{min}=0\))
\(\mathbb E[\lVert h_\theta(x_t, t) + \frac{\sigma_{min}^2\sqrt{1-\sigma_{t}^2}}{\sigma_t^2-\sigma_{min}^2}x_{t} - \frac{\sigma_{t}^2\sqrt{1-\sigma_{min}^2}}{\sigma_t^2-\sigma_{min}^2} x_{t_{min}} \rVert_2^2]\)
Ambient Loss
Denoising Loss
\(x_0\)-prediction
\(\epsilon\)-prediction
(assumes access to \(x_0\))
(assumes access to \(x_{t_{min}}\))
\(\mathbb E[\lVert h_\theta(x_t, t) - x_0 \rVert_2^2]\)
\(\mathbb E[\lVert h_\theta(x_t, t) - \epsilon \rVert_2^2]\)
\(\mathbb E[\lVert h_\theta(x_t, t) - \frac{\sigma_t^2 (1-\sigma_{min}^2)}{(\sigma_t^2 - \sigma_{min}^2)\sqrt{1-\sigma_t^2}}x_t + \frac{\sigma_t \sqrt{1-\sigma_t^2}\sqrt{1-\sigma_{min}^2}}{\sigma_t^2 - \sigma_{min}^2}x_{t_{min}}\rVert_2^2]\)
Granularity
Choosing \(\sigma_{min}\)
Previous results: per dataset and sweep \(\sigma_{min}\)
"Clean" Data
"Corrupt" Data
\(|\mathcal{D}_T|=50\)
\(|\mathcal{D}_S|=2000\)
Eval criteria: Success rate for planar pushing across 200 randomized trials
Previous results: per dataset and sweep \(\sigma_{min}\)
Previous results: per dataset and sweep \(\sigma_{min}\)
\(\sigma_{min}^i = \inf\{\sigma\in[0,1]: c_\theta (x_\sigma, \sigma) > 0.5-\epsilon\}\)
\(p_0\)
\(q_0\)
\(\implies \sigma_{min}^i = \inf\{\sigma\in[0,1]: d_\mathrm{TV}(p_\sigma, q_\sigma) < \epsilon\}\)*
* assuming \(c_\theta\) is perfectly trained
Different checkpoints will choose different \(\sigma_{min}\)
Epoch 400: average \(t_{min}^i\) is 18.2
Epoch 800: average \(t_{min}^i\) is 19.99
Stronger checkpoints\(\implies\) larger \(\sigma_{min}\) required to fool classifier \(\implies\) use corrupt data less
Performance is sensitive to classifier and \(\sigma_{min}\) choice!
\(\sigma=1\)
\(\sigma=0\)
\(\mathbb E[\lVert h_\theta(x_t, t) - x_0 \rVert_2^2]\)
Corrupt Data: ambient loss
\( x_0 \sim p_0\)
\( y_0 \sim q_0\)
\(\sigma=\sigma_{min}\)
\(\mathbb E[\lVert h_\theta(y_t, t) + \frac{\sigma_{min}^2\sqrt{1-\sigma_{t}^2}}{\sigma_t^2-\sigma_{min}^2}y_{t} - \frac{\sigma_{t}^2\sqrt{1-\sigma_{min}^2}}{\sigma_t^2-\sigma_{min}^2} y_{t_{min}} \rVert_2^2]\)
Clean Data: denoising loss
Key insight: if \(\sigma \approx \sigma_{min}\), ambient loss has division by 0 !
\(\sigma=1\)
\(\sigma=0\)
\(\mathbb E[\lVert h_\theta(x_t, t) - x_0 \rVert_2^2]\)
\( x_0 \sim p_0\)
\( y_0 \sim q_0\)
\(\sigma=\sigma_{min}\)
\(\sigma_{buffer}(\sigma_{min}, \sigma_t)\)
\(\mathbb E[\lVert h_\theta(y_t, t) + \frac{\sigma_{min}^2\sqrt{1-\sigma_{t}^2}}{\sigma_t^2-\sigma_{min}^2}y_{t} - \frac{\sigma_{t}^2\sqrt{1-\sigma_{min}^2}}{\sigma_t^2-\sigma_{min}^2} y_{t_{min}} \rVert_2^2]\)
Clean Data: denoising loss
Key insight: Add buffer to stabilize training (avoid division by 0)
Corrupt Data: ambient loss
* denoising loss
Hypothesis: Ambient loss underperforms denoising loss for small \(\mathcal{D}_S\) but may scale better
\(|\mathcal{D}_S|=500\)
\(|\mathcal{D}_S|=8000\)
Hypothesis: Ambient loss underperforms denoising loss for small \(\mathcal{D}_S\) but may scale better
\(\mathbb E[\lVert h_\theta(x_t, t) + \frac{\sigma_{min}^2\sqrt{1-\sigma_{t}^2}}{\sigma_t^2-\sigma_{min}^2}x_{t} - \frac{\sigma_{t}^2\sqrt{1-\sigma_{min}^2}}{\sigma_t^2-\sigma_{min}^2} x_{t_{min}} \rVert_2^2]\)
Ambient Loss
Denoising Loss
\(x_0\)-prediction
(assumes access to \(x_0\))
(assumes access to \(x_{t_{min}}\))
\(\mathbb E[\lVert h_\theta(x_t, t) - x_0 \rVert_2^2]\)
Hypothesis: Ambient loss underperforms denoising loss for small \(\mathcal{D}_S\) but may scale better
Questions:
* Ambient diffusion is most effective for high-frequency corruptions
Maze: RRT
Maze: GCS
T-Pushing: GCS
T-Pushing: Human
T-Pushing: Human
Maze: GCS
Goal: Characterize the nature of the corruption
For every clean sequence:
Average the PSDs of the ~10,000 closest state neighbors
Giannis and I are thinking of some other exciting ideas in this direction...
... more on this once results are available! :D