June 8, 2026

Adam Wei

Ambient Diffusion Policy

Imitation Learning From Suboptimal Data in Robotics

Agenda

  1. Problem Statement
  2. Ambient Diffusion Policy
  3. Why does this work?
  4. Results
  5. Limitations + Future Directions

Key takeaways!

Results are good...

Part 1

Problem Statement

"Suboptimal" Data Is Abundant

  • Sim2real gaps
  • Noisy or non-expert teleop
  • Task-level mismatch 
  • Changes in low-level controller
  • Embodiment gap
  • Camera models, poses, etc
  • Different environment, objects, etc

Open-X

In-Distribution Data

simulation

Suboptimal / OOD Data

Utility of Suboptimal Data

But there is still value and utility in OOD data!

... we just aren't using it correctlty

Open-X

In-Distribution Data

simulation

Suboptimal / OOD Data

Problem Statement

Open-X

In-Distribution Data

simulation

Suboptimal / OOD Data

Colab w/ Giannis Daras

What are principled algorithms for learning from suboptimal data sources?

\(p\)

\(q\)

"Suboptimal" Data Is Abundant

Q. What does "suboptimal" actions mean?

A. You decide 😊

  • Low-quality
  • High-quality, but out-of-distribution
  • Hand tracking data, UMI, etc

Open-X

In-Distribution Data

simulation

Suboptimal / OOD Data

Part 2

Ambient Diffusion Policy

Diffusion Training

\(t=0\)

\(t=T\)

"High-quality" Data

For all \(t \in [0,T]\): train \(h_\theta(A_t, O, t) \approx \mathbb{E}[A_0 \mid A_t, O]\)

Co-training

For all \(t \in [0,T]\): train \(h_\theta(A_t, O, t) \approx \mathbb{E}[A_0 \mid A_t, O]\)

"Suboptimal" Data

"High-quality" Data

\(t=0\)

\(t=T\)

Co-training

For all \(t \in [0,T]\): train \(h_\theta(A_t, O, t) \approx \mathbb{E}[A_0 \mid A_t, O]\)

"Suboptimal" Data

"High-quality" Data

\(t=0\)

\(t=T\)

\(\alpha\)

\(1-\alpha\)

\(p^{train} = \alpha\) \(p\)\(+(1-\alpha)\) \(q\)

\(p^{train}\) contains \(q\) \(\implies\) This is the wrong objective

\(\pi(A|O)\) learns both the good and the bad features of \(q\)

Ambient Diffusion Policy

\(t_{\min}\)

\(t> t_{\min}\)

"High-quality" Data

\(p_t\) \(\not\approx\) \(q_t\)

\(p_t\) \(\approx\) \(q_t\)

\(t=T\)

For all \(t \in [0,T]\): train \(h_\theta(A_t, O, t) \approx \mathbb{E}[A_0 \mid A_t, O]\)

\(t=0\)

Gaussian Noise As Contraction

\(p_0\)

\(q_0\)

\(p_t\)

\(q_t\)

D(p_0, q_0) \geq D(p_t, q_t)

\(D(p_t, q_t) \to 0\) as \(t\to \infty\) 

\(\implies \exists t_{min} \ \mathrm{s.t.}\ D(p_t, q_t) < \epsilon\ \forall t \in (t_{min}, T]\)

Noisy Channel

\(Y = X + \sigma_t Z\)

\(D(p_0, q_0)\)

\(D(p_t, q_t)\)

Ambient Diffusion Policy

\(t_{\min}\)

\(t> t_{\min}\)

"High-quality" Data

\(p_t\) \(\not\approx\) \(q_t\)

\(p_t\) \(\approx\) \(q_t\)

\(t=T\)

For all \(t \in [0,T]\): train \(h_\theta(A_t, O, t) \approx \mathbb{E}[A_0 \mid A_t, O]\)

\(t=0\)

Loss Function

Loss Function (for \(x_0\sim q_0\))

Denoising Loss vs Ambient Loss

Choosing \(\sigma_{min}\)

How to choose \(t_{min}\)?

\(\sigma_{t_{min}}\)

At high noise, high quality and low-quality actions are indistinguishable

Suff: if a classifier cannot reliable discern \(p_t\) and \(q_t\), then the data is safe to use

Loss Function

Loss Function (for \(x_0\sim q_0\))

Denoising Loss vs Ambient Loss

How to choose \(t_{min}\)?

Increasing granularity

Assign \(t_{min}\) per datapoint

Assign \(t_{min}\) per dataset

Run the classifier per dataset

Run the classifier per datapoint

We will see examples across this spectrum...

Ambient Diffusion Policy

\(t_{\min}\)

\(t> t_{\min}\)

"High-quality" Data

\(p_t\) \(\approx\) \(q_t\)

\(t=T\)

For all \(t \in [0,T]\): train \(h_\theta(A_t, O, t) \approx \mathbb{E}[A_0 \mid A_t, O]\)

\(t<t_{\max}\)

"Locality"

Both intervals \([0, t_{max})\) and \((t_{min}, T]\) have interpretations. More on this later...

\(t=0\)

\(t_{\max}\)

Implementation: Very simple!!

\(t_{\min}\)

\(t> t_{\min}\)

"High-quality" Data

\(t=T\)

For all \(t \in [0,T]\): train \(h_\theta(A_t, O, t) \approx \mathbb{E}[A_0 \mid A_t, O]\)

\(t<t_{\max}\)

  1. Sample diffusion time, \(t\)
  2. Sample admissible datapoint from \(\mathcal{D}_p \cup \mathcal{D}_q\)

\(t_{\max}\)

\(t=0\)

Implementation: Very simple!!

\(t_{\min}\)

\(t> t_{\min}\)

"High-quality" Data

\(t=T\)

\(t<t_{\max}\)

  1. Sample diffusion time, \(t\)
  2. Sample admissible datapoint from \(\mathcal{D}_p \cup \mathcal{D}_q\)

\(t_{\max}\)

\(t=0\)

Question break!

Part 3

Why Does This Work?

Answer

The structure of robot data

Ambient Diffusion Policy

\(t_{\min}\)

\(t> t_{\min}\)

"High-quality" Data

\(t=T\)

For all \(t \in [0,T]\): train \(h_\theta(A_t, O, t) \approx \mathbb{E}[A_0 \mid A_t, O]\)

\(t<t_{\max}\)

\(t_{\max}\)

\(p_t\) \(\approx\) \(q_t\)

"Locality"

Both intervals \([0, t_{max})\) and \((t_{min}, T]\) have interpretations.

\(t=0\)

Ambient Diffusion Policy

\(t_{\min}\)

\(t> t_{\min}\)

"High-quality" Data

\(t=T\)

For all \(t \in [0,T]\): train \(h_\theta(A_t, O, t) \approx \mathbb{E}[A_0 \mid A_t, O]\)

\(t_{\max}\)

\(p_t\) \(\approx\) \(q_t\)

Utility of \(q\) is highest when \(t_{min}\) is small.

\(t=0\)

Let's start with \((t_{min}, T]\).

Loss Function

Loss Function (for \(x_0\sim q_0\))

Denoising Loss vs Ambient Loss

Power Spectral Density (PSD)

By Sander Dieleman

Image data has spectral power law

\(\implies\)

  1. Image diffusion is coarse-to-fine
  2. Noise masks high-freq first

Robot action data exhibits a spectral power law

Loss Function

Loss Function (for \(x_0\sim q_0\))

Denoising Loss vs Ambient Loss

Robot Data: PSD

Loss Function

Loss Function (for \(x_0\sim q_0\))

Denoising Loss vs Ambient Loss

Robot Data: PSD

Loss Function

Loss Function (for \(x_0\sim q_0\))

Denoising Loss vs Ambient Loss

Implications for Robotics

Loss Function

Loss Function (for \(x_0\sim q_0\))

Denoising Loss vs Ambient Loss

Implications for Robotics

Loss Function

Loss Function (for \(x_0\sim q_0\))

Denoising Loss vs Ambient Loss

Implications for Diffusion Policy

Loss Function

Loss Function (for \(x_0\sim q_0\))

Denoising Loss vs Ambient Loss

1. Implications for Diffusion Policy

Diffusion Policy's learn different features at different noise levels

We should only use suboptimal data when it aligns with high-quality data

Loss Function

Loss Function (for \(x_0\sim q_0\))

Denoising Loss vs Ambient Loss

2. Implications for Ambient Diffusion Policy

Noise masks motion primitives first

\(\implies t_{min}\) is small when the suboptimality is motion-level

Noise masks motion primitives first

\(\implies t_{min}\) is small when the suboptimality is motion-level

i.e. \(q\) contains the correct global plan, but the incorrect low-level motions

  • Noisy or non-expert teleop
  • Sim2real gaps
  • Changes in low-level controller
  • Embodiment gap
  • Hand-tracking Data
  • Different environment, objects, etc

2. Implications for Ambient Diffusion Policy

Loss Function

Loss Function (for \(x_0\sim q_0\))

Denoising Loss vs Ambient Loss

Ambient Diffusion Policy

\(t_{\min}\)

\(t> t_{\min}\)

"High-quality" Data

\(t=T\)

For all \(t \in [0,T]\): train \(h_\theta(A_t, O, t) \approx \mathbb{E}[A_0 \mid A_t, O]\)

\(t<t_{\max}\)

\(t_{\max}\)

\(p_t\) \(\approx\) \(q_t\)

"Locality"

Both intervals \([0, t_{max})\) and \((t_{min}, T]\) have interpretations.

\(t=0\)

Loss Function

Loss Function (for \(x_0\sim q_0\))

Denoising Loss vs Ambient Loss

Task-level Corruptions

What if the action corruption is task-level?

Red \(\rightarrow\) Left

Blue \(\rightarrow\) Right

Red \(\rightarrow\) Right

Blue \(\rightarrow\) Left

(out-dated video...)

Loss Function

Loss Function (for \(x_0\sim q_0\))

Denoising Loss vs Ambient Loss

Locality

Locality (of the optimal denoiser at low noise)

The output at each coordinate depends primarily on a small receptive field in the noisy input

Loss Function

Loss Function (for \(x_0\sim q_0\))

Denoising Loss vs Ambient Loss

Locality

Sensitivity of \(\hat a_0^{(8)}\) to \(a_\sigma^{i}\) at different noise levels

Loss Function

Loss Function (for \(x_0\sim q_0\))

Denoising Loss vs Ambient Loss

Locality

Locality (of the optimal denoiser at low noise)

The output at each coordinate depends primarily on a small receptive field in the noisy input

For robotics, denoisers at low noise

  1. Learn motion primitives
  2. Ignore global task structure

\(\implies\) can learn to grasp from data for the wrong task!

Part 4a

Controlled Experiments

Question break!

Ambient Diffusion Policy

\(t_{\min}\)

\(t> t_{\min}\)

"High-quality" Data

\(t=T\)

For all \(t \in [0,T]\): train \(h_\theta(A_t, O, t) \approx \mathbb{E}[A_0 \mid A_t, O]\)

\(t_{\max}\)

\(p_t\) \(\approx\) \(q_t\)

\(t=0\)

Following 3 experiments only use \(t_{min}\)

Loss Function

Loss Function (for \(x_0\sim q_0\))

Denoising Loss vs Ambient Loss

Choosing \(\sigma_{min}\)

Motion Planning Experiments

Distribution shift: Low-quality, noisy trajectories

High Quality: 

100 GCS trajectories

Low Quality: 

5000 RRT trajectories

Loss Function

Loss Function (for \(x_0\sim q_0\))

Denoising Loss vs Ambient Loss

Choosing \(\sigma_{min}\)

Task vs Motion Level

Distribution shift: Low-quality, noisy trajectories

\(\sigma=0\)

5000 RRT Trajectories

\(\sigma_{min}\)

\(\sigma=1\)

100 GCS Trajectories

Task level:

learn the maze structure

Motion level:

learn smooth motions

Loss Function

Loss Function (for \(x_0\sim q_0\))

Denoising Loss vs Ambient Loss

Choosing \(\sigma_{min}\)

Results

GCS

Success Rate

Avg. Acc^2

(Motion-level)

RRT

GCS+RRT

(Co-train)

GCS+RRT

(Ambient)

57.5%

Swept for best \(t_{min}\) per dataset

Policies evaluated over 1000 trials each

99.0%

141.65

74.8

99.4%

62.2

99.5%

30.9

Loss Function

Loss Function (for \(x_0\sim q_0\))

Denoising Loss vs Ambient Loss

Choosing \(\sigma_{min}\)

Qualitative Results

Co-trained

Ambient

Loss Function

Loss Function (for \(x_0\sim q_0\))

Denoising Loss vs Ambient Loss

Choosing \(\sigma_{min}\)

7-DoF Motion Planning

Clean data:

  • 100k trajopt trajectories

Corrupt data:

  • 1M RRT trajectories

Loss Function

Loss Function (for \(x_0\sim q_0\))

Denoising Loss vs Ambient Loss

Choosing \(\sigma_{min}\)

7-DoF Motion Planning

Trajopt

Success Rate

Avg. Acc^2

(Motion-level)

RRT

Trajopt+RRT

(Co-train)

Trajopt+RRT

(Ambient)

46.0%

52.0%

3.9

54.9

59.9%

42.7

65.9%

31.4

Swept for best \(\sigma_{min}\) per dataset

Policies evaluated over 1000 trials each

Sim & Real Cotraining

Distribution shift: sim2real gap

In-Distribution: 

50 demos in "target" environment

Out-of-Distribution: 

2000 demos in sim environment

Loss Function

Loss Function (for \(x_0\sim q_0\))

Denoising Loss vs Ambient Loss

Choosing \(\sigma_{min}\)

Results

"Real" Only

Success Rate

Co-train

Ambient

(single \(t_{min}\))

56.5%

Policies evaluated over 200 trials each

84.5%

87.0%

Ambient

(\(t_{min}\) per datapoint)

93.5%

Goal: isolate the effect of locality in robotics

Locality Only

"High-quality" Data

\(t=T\)

For all \(t \in [0,T]\): train \(h_\theta(A_t, O, t) \approx \mathbb{E}[A_0 \mid A_t, O]\)

\(t<t_{\max}\)

\(t_{\max}\)

"Locality"

\(t=0\)

Example: Bin Sorting

Distribution shift: task level mismatch, motion level correctness

In-Distribution: 

50 demos with correct sorting logic

Out-of-Distribution: 

200 demos with incorrect sorting

2x

2x

Metrics

Robot needs to learn two things:

1. Motion Planning

2. Logic

\(\frac{\#\ blocks \ in \ any \ bin}{total \ blocks}\)

\(\frac{\# \ blocks \ in \ correct \ bin}{\# \ blocks \ \ in \ any  bin}\)

Goal: learn motion planning from the bad data, but not the task planning

Success rate:

\(\frac{\# \ blocks \ in \ correct \ bin}{\# \ total \ blocks}\) = (motion planning) x (logic)

Results

Diffusion

Success Rate

Logic Metric

Cotrain

Motion Metric

Locality

61.0%

61.9%

98.6%

22.7%

87.2%

26.0%

93.3%

95.0%

98.2%

Task Planning

Motion Planning

Task Conditioning

Success Rate

Logic Metric

Cotrain

(with task condition)

Motion Metric

Locality

90.3%

91.5%

98.6%

93.3%

95.0%

98.2%

Locality

(with task condition)

92.8%

94.2%

98.5%

Part 4b

Scaling Experiments

Scaling to Real-World Datasets

Open-X

Diffusion Policy

In-Distribution Data

Policy

\(\pi(a | o, l)\)

\(p\)

\(q\)

"Suboptimal" / OOD Data

  • cross-embodied
  • diff. teleoperators
  • sim data
  • mislabeled data
  • diff tasks, environments, camera

"Suboptimal" Data

Open-X

Magic Soup++: 27 Datasets

Custom OXE: 48 Datasets

  • 1.4M episodes
  • 55M "datagrams"

Table Cleaning

Tower Building

*both videos are autonomous rollouts from Ambient Diffusion Policies at 2x speed

Table Cleaning & Tower Building

Scaling to Real-World Datasets

84%

33%

More "suboptimal" data

Ablations

  1. Finetuning comparison
  2. Re-weighting + Ambient
  3. "Suboptimal" Observations
  4. Parameter sweeps
  5. Evidence of global-to-local hierarchy

Part 5

Limitations and Future Work

Future Directions

Q: What is "in-distribution" or "high-quality"?

A [in this paper]: expert teleoperator on your robot, your task, your environment

A [more generally]: data quality?

Q: Better methods to choose \(t_{min}\) and \(t_{max}?\)

Q: Soft Ambient / Rejection-based sampling

Thank You!

Ambient can be used to learn from any suboptimal / OOD data in robotics

In-Distribution Data

Open-X

simulation

Suboptimal / OOD Data

Paper link: will arXiv soon! (likely Tuesday night)