Dec 22, 2025
Adam Wei
1. Quick update on experiments
2. How should we write the paper?
robot teleop
simulation
Open-X
Distribution shift: sim2real gap
In-Distribution:
50 demos in "target" environment
Out-of-Distribution:
2000 demos in sim environment
Real Only
Sim + Real
(reweighted)
Ambient
(\(\sigma_{min}\) per datapoint)
Ambient-Omni
56.5%
84.5%
93.5%
92.0%
Perf
(200 trials)
In-Distribution:
50 demos with correct sorting logic
Out-of-Distribution:
200 demos with incorrect sorting
2x
2x
Distribution shift: task level mismatch, motion level correctness
Contrived experiment... but it effectively illustrates the effect of \(\sigma_{max}\)
Repeat:
\(\sigma=0\)
\(\sigma=1\)
\(\sigma_{max}\)
\(\sigma>\sigma_{min}\)
\(\sigma_{max}\)
Good Data Only
Score
(Task + motion)
Correct logic
(Task level)
Cotrain
(\(\alpha^*=0.9\))
Completed
(Motion level)
Ambient-Omni
(\(\sigma_{max}=0.46\))
61.01%
61.9%
98.6%
Cotrain
(task-conditioned, \(\alpha^*=0.5\))
90.3%
91.5%
98.6%
22.68%
87.2%
26.0%
93.3%
95.0%
98.2%
Motion Planning*
Task
Planning*
Distribution shift: Low-quality, noisy trajectories
High Quality:
100 GCS trajectories
Low Quality:
5000 RRT trajectories
Distribution shift: Low-quality, noisy trajectories
High Quality:
100 GCS trajectories
Low Quality:
5000 RRT trajectories
GCS
Success Rate
(Task-level)
Avg. Jerk^2
(Motion-level)
RRT
GCS+RRT
(Co-train)
GCS+RRT
(Ambient)
50%
Policies evaluated over 100 trials each
100%
7.5k
17k
91%
14.5k
98%
5.5k
Generate good (expensive) and bad (cheap) motion planning data in 20,000 environments
Evalute in new scences. Goal is to generate good trajectories
Open-X
Open-X
Variant of Ambient Omni
Cool Task!!
Cool Demo!!
Open-X
Magic Soup++: 27 Datasets
Custom OXE: 48 Datasets
Ambient policy on OOD objects. (2x speed)
Task completion =
0.1 x [opened drawer]
+ 0.8 x [# obj. cleaned / # obj.]
+ 0.1 x [closed drawer]
Question: How to compute error bars?
Reweighted
(sample clean 50%)
Unweighted
(sample clean 0.06%)
Cotrain
Ambient
Ambient-Omni
Magic Soup++
Custom OXE
Clean only (50 demos): 63.3%
68.8% / bad!
74.2% / bad!
80.3% / 68.7%
72.8%* / 75.8%
?
?
* Need to retrain
Clean Only
Magic Soup Cotrain
Custom OXE Cotrain
Opening
Closing
Cleaning
Task Completion
20/20
20/20
20/20
34/60
37/60
41/60
16/20
19/20
16/20
19/20
0.633
0.688
0.742
Even the best policy right now is mediocre...
Option 1:
Option 2: