July 1, 2025
Adam Wei
Part 1
Ambient-Omni
Corrupt Data (\(q_0\))
Clean Data (\(p_0\))
Computer Vision
Language
Poor writing, profanity, toxcity, grammar/spelling errors, etc
👀
Open-X
expert robot teleop
robot teleop
simulation
Goal: sample "high-quality" trajectories for your task
Train on entire spectrum to learn "high-level" reasoning, semantics, etc
"corrupt"
"clean"
Giannis Daras
Giannis Daras
Three key ideas
1. Corrupt data points should be contracted (masked with noise) and used at higher noise levels
2. Ambient loss function
3. OOD data with locality can be used at lower noise levels
Ambient Diffusion: 1 and 2
Ambient-o Diffusion: 1, 2, and 3
\(\exists \sigma_{min}\) s.t. \(d_\mathrm{TV}(p_{\sigma_{min}}, q_{\sigma_{min}}) < \epsilon\)
(in this example, \(\epsilon = 0.05\))
\(p_0\)
\(q_0\)
\(\exists \sigma_{min}\) s.t. \(d_\mathrm{TV}(p_{\sigma_{min}}, q_{\sigma_{min}}) < \epsilon\)
\(\sigma=1\)
\(\sigma=0\)
\(\sigma=\sigma_{min}\)
Clean Data
Corrupt Data
... but also destroys useful signal
More info lost
\(q_\sigma \approx p_\sigma\) for \(\sigma > \sigma_{min}\), but not equal
Theoretical Analysis
Provided that
\(\implies\) training with \(q_0\) introduces bias!
Then \(\exists \sigma_{min} < 1\) s.t. learning from clean and corrupt data outperforms learning from clean only*
* as measured by TV between the learned \(\hat p\) and \(p_0\)
\(\sigma=1\)
\(\sigma=0\)
\(\sigma=\sigma_{min}\)
Clean Data
Corrupt Data
\(\sigma=1\)
\(\sigma=0\)
\(\sigma=\sigma_{min}\)
\(\mathbb E[\lVert h_\theta(x_t, t) - x_0 \rVert_2^2]\)
\(\mathbb E[\lVert h_\theta(y_t, t) - y_0 \rVert_2^2]\)
Clean Data: \(x_0\)-prediction or \(\epsilon\)-prediction
Corrupt Data: \(x_0\) or \(\epsilon\)-prediction
\( x_0 \sim p_0\)
\( y_0 \sim q_0\)
\(\sigma=1\)
\(\sigma=0\)
\(\sigma=\sigma_{min}\)
\(\mathbb E[\lVert h_\theta(x_t, t) - x_0 \rVert_2^2]\)
\(\mathbb E[\lVert h_\theta(y_t, t) + \frac{\sigma_{min}^2\sqrt{1-\sigma_{t}^2}}{\sigma_t^2-\sigma_{min}^2}y_{t} - \frac{\sigma_{t}^2\sqrt{1-\sigma_{min}^2}}{\sigma_t^2-\sigma_{min}^2} y_{t_{min}} \rVert_2^2]\)
Corrupt Data: ambient loss*
\( x_0 \sim p_0\)
\( y_0 \sim q_0\)
* Giannis and I are working on the \(\epsilon\)-prediction versions
Clean Data: \(x_0\)-prediction or \(\epsilon\)-prediction
\(\sigma=1\)
\(\sigma=0\)
\(\mathbb E[\lVert h_\theta(x_t, t) - x_0 \rVert_2^2]\)
Corrupt Data: ambient loss*
\( x_0 \sim p_0\)
\( y_0 \sim q_0\)
* Giannis and I are working on the \(\epsilon\)-prediction versions
\(\sigma=\sigma_{min}\)
\(\sigma_{buffer}(\sigma_{min}, \sigma_t)\)
\(\mathbb E[\lVert h_\theta(y_t, t) + \frac{\sigma_{min}^2\sqrt{1-\sigma_{t}^2}}{\sigma_t^2-\sigma_{min}^2}y_{t} - \frac{\sigma_{t}^2\sqrt{1-\sigma_{min}^2}}{\sigma_t^2-\sigma_{min}^2} y_{t_{min}} \rVert_2^2]\)
Clean Data: \(x_0\)-prediction or \(\epsilon\)-prediction
\(\sigma=1\)
\(\sigma=0\)
\(\mathbb E[\lVert h_\theta(x_t, t) - x_0 \rVert_2^2]\)
\( x_0 \sim p_0\)
\( y_0 \sim q_0\)
\(\sigma=\sigma_{min}\)
\(\sigma_{buffer}(\sigma_{min}, \sigma_t)\)
\(\mathbb E[\lVert h_\theta(y_t, t) + \frac{\sigma_{min}^2\sqrt{1-\sigma_{t}^2}}{\sigma_t^2-\sigma_{min}^2}y_{t} - \frac{\sigma_{t}^2\sqrt{1-\sigma_{min}^2}}{\sigma_t^2-\sigma_{min}^2} y_{t_{min}} \rVert_2^2]\)
Ambient-o also presents a way to use OOD data to train denoisers in the low-noise regime
... future direction to try
Part 2
Experiments
Three key ideas -- tried two so far...
1. Corrupt data points should be contracted (masked with noise) and used at higher noise levels
2. Ambient loss function
3. OOD data with locality can be used at lower noise levels
Denoising Loss
Idea 2: Ambient Loss
* This is equivalent to:
Use corrupt data \(\forall \sigma\)
Idea 1: Use corrupt data \(\forall \sigma > \sigma_{min}\)
N/A
(reduces to baseline)
\(\epsilon\)-prediction*
\(x_0\)-prediction*
\(\epsilon\)-prediction
\(x_0\)-prediction
\(\epsilon\)-prediction**
\(x_0\)-prediction
** Giannis and I are working on the \(\epsilon\)-prediction ambient loss
In my experiments, I sweep \(\sigma_{min}\) at the dataset level
\(\sigma_{min}\in \{0.09, 0.16, 0.32, 0.48, 0.59, 0.81\}\)
"Clean" Data
"Corrupt" Data
\(|\mathcal{D}_T|=50\)
\(|\mathcal{D}_S|=2000, 4000, 8000\)
Eval criteria: Success rate for planar pushing across 200 randomized trials
Experiments
Choosing \(\sigma_{min}\): Swept several values on the dataset level
Loss function: Tried all 4 combinations of
{\(x_0\)-prediction, \(\epsilon\)-prediction} x {denoising, ambient}
Preliminary Observations
(will present best results on next slide)
\(|\mathcal{D}_S| = 50\), \(|\mathcal{D}_S| = 2000\), \(\epsilon\)-prediction with denoising loss
*
* \(\sigma_{min}=0\) and the red baseline should be approx. equal...
\(\sigma^*_{min}\) is small
This is an unfavorable setting for ambient diffusion
Giannis and I think we can get the policy to over 90%..
Part 3
Next Directions
Hypothesis:
haven't test this yet... sorry Russ
Ambient-o provides a way to use corrupt data to learn in both of these regimes
\(q_0\) Trajectory Quality
\(q_0\) Planning Quality
Experiment 1:
RRT vs GCS
Low
Medium
Corruption Regime
High-frequencies
GCS
(clean)
RRT
(clean)
Task: Cotrain on GCS and RRT data
Goal: Sample clean and smooth GCS plans
\(q_0\) Trajectory Quality
\(q_0\) Planning Quality
Experiment 1:
RRT vs GCS
Low
Medium
Corruption Regime
High-frequencies
Experiment 2:
Cross-embodiment (Lu)
Low
Medium
Low-fequencies
\(q_0\) Trajectory Quality
\(q_0\) Planning Quality
Experiment 1:
RRT vs GCS
Experiment 3:
Bin Organization
Low
Medium
Medium
Incorrect
Corruption Regime
High-frequencies
Low-fequencies
Experiment 2:
Cross-embodiment (Lu)
Low
Medium
High-fequencies
Task: Pick-and-place objects into specific bins
Clean Data: Demos with the correct logic
Corrupt Data: incorrect logic, Open-X, pick-and-place datasets, datasets with bins, etc
In image generation...
Cat images can be used to train a generative models for dogs!