Aug 15, 2024
Adam Wei
Behavior cloning (Diffusion Policy) can solve many robot tasks
...when data is available
Goal:
Diffusion Policy:
Training loss:
Condition on observations O by passing it as an input to the denoiser.
Sampling Procedure
Train with DDPM, sample with DDIM (faster inference)
Robot Poses
O
A
Robot Poses
A
O
Ex: From a human demonstrator
Ex: From GCS
Cotraining: Use both datasets to train a model that maximizes some test objective
Cotraining: Use both datasets to train a Diffusion Policy that maximizes empirical success rate
Dataset Mixture: to sample from :
SR= 10/20
SR= 19/20
SR= 14/20
Mixing ratio is important for good performance!
SR= 2/20
SR= 3/20
SR= 14/20
The data scale affects the optimal mixing ratio!
1. How do different data scales and mixing ratios affect policy performance (success rate)?
2. How do different distribution shifts in the simulated data affect the optimal mixing ratio and policy performance?
Task: Push a T-object from any initial planar pose within the robot's workspace to a pre-specified target pose.
Planar pushing is the simplest task that captures the broader challenges in manipulation.
Real world data: teleoperation
Sim data:
1. How do different data scales and mixing ratios affect the success rate of co-trained policies?
Red: 2000 sim demos, Blue: 500 sim demos, Yellow: Real only baseline
Mixing ratio takeaways:
Red: 2000 sim demos, Blue: 500 sim demos, Yellow: Real only baseline
Red: 2000 sim demos, Blue: 500 sim demos, Yellow: Real only baseline
Level 1
Level 2
Level 3
Mixing ratio takeaways:
Scaling up sim data:
Cotrained policies exhibit similar characteristics to the real-world expert regardless of the mixing ratio.
How does the simulate data help the final policy?
Text
Real Data
Sim Data
1. Cotrained policies rely on real data for high-level decisions. Sim data helps fill in missing gaps.
2. The conditional distributions for real and sim are different (which the model successfully learns).
3. Sim data contains more information about high-probability actions, p(A)
4. Sim data prevents overfitting to real.
1. Cotrained policies rely on real data for high-level decisions. Sim data helps fill in missing gaps.
kNN on actions
kNN on observation embeddings TODO: change to red
kNN on actions
kNN on observation embeddings
Note to self:
We can see that the policy is closer to real data when the T is far from the goal. During eval, the T is also placed far from the goal. This could explain why the policy appears to use real data for high-level logical decisions: the initial conditions are outside the support of the sim data, so the policy must rely on real world data to make the initial logical decisions
1. Cotrained policies rely on real data for high-level decisions. Sim data helps fill in missing gaps.
kNN on actions
kNN on obs embedding
kNN on actions
kNN on obs embeddings
kNN on actions
kNN on obs embeddings
2. The conditional distributions for real and sim are different which the model successfully learns.
Experiment: roll out the cotrained policy in sim and observe the behavior.
To-do: need a way to visualize or compare how similar a trajectory is to sim vs real
3. Sim data contains information about high-probability actions (i.e. co-training can help the model learn p(A) )
Future experiment: Cotrain with classifier-free guidance.
Conditional Score Estimate
Unconditional Score Estimate
Immediate experiment: Replace the sim images with all zeros and cotrain.
4. Sim data helps prevent overfitting (acts like a regularizer)