For review. Please do not distribution.
Anonymous Submission CoRL 2026
Imitation Learning From Suboptimal Data in Robotics
Open-X
Diffusion Policy
In-Distribution Data
Policy
\(\pi(a | o, l)\)
simulation
\(p\)
\(q\)
"Suboptimal" / OOD Data
Open-X
Diffusion Policy
In-Distribution Data
Policy
\(\pi(a | o, l)\)
simulation
\(p\)
\(q\)
"Suboptimal" / OOD Data
What are principled algorithms for learning from suboptimal data sources?
\(t=0\)
\(t=T\)
"High-quality" Data
For all \(t \in [0,T]\): train \(h_\theta(A_t, O, t) \approx \mathbb{E}[A_0 \mid A_t, O]\)
"Suboptimal" Data
"High-quality" Data
Policy learns to sample from a mixture of \(p\) and \(q\)
\(t=0\)
\(t=T\)
For all \(t \in [0,T]\): train \(h_\theta(A_t, O, t) \approx \mathbb{E}[A_0 \mid A_t, O]\)
\(t_{\min}\)
\(t> t_{\min}\)
"High-quality" Data
\(p_t\) \(\not\approx\) \(q_t\)
\(p_t\) \(\approx\) \(q_t\)
Consequence of the data processing inequality
\(t=0\)
\(t=T\)
For all \(t \in [0,T]\): train \(h_\theta(A_t, O, t) \approx \mathbb{E}[A_0 \mid A_t, O]\)
\(t> t_{\min}\)
"High-quality" Data
At low noise levels, action denoising only depends on local actions
\(t<t_{\max}\)
\(t_{\max}\)
"Locality"
\(p_t\) \(\approx\) \(q_t\)
\(t_{\min}\)
\(t=0\)
\(t=T\)
For all \(t \in [0,T]\): train \(h_\theta(A_t, O, t) \approx \mathbb{E}[A_0 \mid A_t, O]\)
Noisy Trajectories: 2D Maze
Noisy Trajectories: 7-DoF arm
Sim2real: Planar Pushing
Task Mismatch: Block Sorting
Open-X
Diffusion Policy
In-Distribution Data
Policy
\(\pi(a | o, l)\)
\(p\)
\(q\)
"Suboptimal" / OOD Data
Table Cleaning
Tower Building
*both videos are autonomous rollouts from Ambient Diffusion Policies at 2x speed
84%
33%
In-Distribution Data
Open-X
simulation
Suboptimal / OOD Data
Ambient Diffusion Policy
a principled algorithm for learning from arbitrary suboptimal data
Contributions
Spectral Power Law
\(\implies\)
global-to-local hierarchy in Diffusion Policy
High noise: Diffusion Policy learns global planning
Low noise: Diffusion Policy learns local motion primitives
Locality at low noise:
The optimal denoiser has a small receptive field.
i.e. it ignores global features