Oct 29 / Nov 18, 2024
Adam Wei
Big data
Big transfer gap
Small data
No transfer gap
Ego-Exo
robot teleop
Open-X
simulation
How can we obtain data for imitation learning?
Cotrain from different data sources
(ex. sim & real)
Sim Infrastructure
Sim Data Generation
Octo
DROID
Similar comments in OpenX, OpenVLA, etc...
... will make more concrete in a few slides
Robot Actions
O
A
Robot Actions
A
O
Cotraining: Use both datasets to train a model that maximizes some test objective
R, S => real and sim
\(|\mathcal D|\) = # demos in \(\mathcal D\)
\(\mathcal{L}=\mathbb{E}_{p_{O,A},k,\epsilon^k}[\lVert \epsilon^k - \epsilon(O_t, A^o_t + \epsilon^k, k) \rVert^2] \)
\(\mathcal{L}_{\mathcal{D}}=\mathbb{E}_{\mathcal{D},k,\epsilon^k}[\lVert \epsilon^k - \epsilon(O_t, A^o_t + \epsilon^k, k) \rVert^2] \approx \mathcal L \)
\(\mathcal D^\alpha\) Dataset mixture
Goal:
Diffusion Policy:
Goal: Manipulate object to target pose
Limitations
Informs the qualities we want in sim
(Tower property of expectations)
Vanilla Cotraining:
\(\mathcal D^\alpha\) Dataset mixture
For vanilla cotraining: how do \(|\mathcal D_R|\), \(|\mathcal D_S|\), and \(\alpha\) affect the policy's success rate?
\(|\mathcal D_R|\) = 10, 50, 150 demos
\(|\mathcal D_S|\) = 500, 2000 demos
\(\alpha = 0,\ \frac{|\mathcal{D}_{R}|}{|\mathcal{D}_{R}|+|\mathcal{D}_{S}|},\ 0.25,\ 0.5,\ 0.75,\ 1\)
Sweep the following parameters:
For vanilla cotraining: how do \(|\mathcal D_R|\), \(|\mathcal D_S|\), and \(\alpha\) affect the policy's success rate?
\(|\mathcal D_R| = 10\)
\(|\mathcal D_R| = 50\)
\(|\mathcal D_R| = 150\)
cyan: \(|\mathcal D_S| = 500\) orange: \(|\mathcal D_S| = 2000\) red: real only
10 real demos
2000 sim demos
50 real demos
500 sim demos
Initial experiments suggest that scaling up sim is a good idea!
... to be verified at larger scales with sim-sim experiments
\(|\mathcal D_R|\)
\(|\mathcal D_S|\)
Experiments match intuition and theoretical bounds
\(|\mathcal D_R|\) = 10, 50 demos
\(|\mathcal D_S|\) = 500, 2000 demos
Finetune a pretrained sim policy
Finetune a cotrained policy
\(\implies\) 8 models, 20 trials each
Informs the qualities we want in sim
Setting up a simulator for data generation is non-trivial
+ Actions
What qualities matter in the simulated data?
Visual Shifts
Physics Shifts
Task Shifts
Original
Color Shift
Goal Shift
Object Shift
Experimental Setup
** Except object shift
Research question: Can we develop cotraining algorithms that mitigate the effects of distributions shifts?
Domain Randomization
Color Shift
Center of Mass Shift
Goal Shift
Object Shift
Sim Demos
Real Rollout
Cotrained policies exhibit similar characteristics to the real-world expert regardless of the mixing ratio
Real Data
Sim Data
How does cotraining improve performance?
Probably some combination of the above.
Cotrained policies can identify the sim vs real: sim data improves representation learning and fills in gaps in real data
Policy | Observation embeddings acc. | Final activation acc. |
---|---|---|
50/500, alpha = 0.75 | 100% | 74.2% |
50/2000, alpha = 0.75 | 100% | 89% |
10/2000, alpha = 0.75 | 100% | 93% |
10/2000, alpha = 5e-3 | 100% | 94% |
10/500, alpha = 0.02 | 100% | 88% |
Only a small amount of real data is needed to separate the embeddings and activations
Real Rollout
Sim Rollout
Mostly red with blue interleaved
Mostly blue
Assumption: if kNN are real/sim, this behavior was learned from real/sim
Similar results for kNN on embeddings
Sim data prevents overfitting and acts as a regularizor
When \(|\mathcal D_R|\) is small, \(\mathcal L_{\mathcal D_R}\not\approx\mathcal L\).
\(\mathcal D_S\) helps regularize and prevent overfitting
Sim data provides more information about \(p_A(a)\)
Can we test/leverage this hypothesis with classifier-free guidance?
... more on this later if time allows
Research Questions
Example ideas:
Thought experiment: what if sim and real were nearly indistinguishable?
Fact: cotrained models can distinguish sim & real
Thought experiment: what if sim and real were nearly indistinguishable?
\(\mathrm{dist}(\mathcal D_R,\mathcal D_S)\) small \(\implies p^{real}_{(O,A)} \approx p^{sim}_{(O,A)} \implies p^{real}_O \approx p^{sim}_O\)
'visual sim2real gap'
Current approaches to sim2real: make sim and real visually indistinguishable
\(p^{R}_O \approx p^{S}_O\)
Do we really need this?
\(p^{R}_O \approx p^{S}_O\)
\(a^k\)
\(\hat \epsilon^k\)
\(o \sim p_O\)
\(o^{emb} = f_\psi(o)\)
\(p^{R}_O \approx p^{S}_O\)
\(p^{R}_{emb} \approx p^{S}_{emb}\)
\(\implies\)
Weaker requirement
\(\epsilon_\theta\)
\(f_\psi\)
\(d_\phi\)
\(\hat{\mathbb P}(f_\psi(o)\ \mathrm{is\ sim})\)
\(\epsilon^k\)
\(a^k\)
o
Denoiser Loss
Negative BCE Loss
\(\iff\)
(Variational characterization of f-divergences)
Common features (sim & real)
Distinguishable features*
Relavent for control...
* also known as protected variables in AI fairness literature
\(\epsilon_\theta\)
\(f_\psi\)
\(d_\phi\)
\(\hat{\mathbb P}(f_\psi(o)\ \mathrm{is\ sim})\)
\(\epsilon^k\)
\(a^k\)
o
Denoiser Loss
Negative BCE Loss
\(\iff\)
(Variational characterization of f-divergences)
Hypothesis:
\(|\mathcal D_R| = 50\), \(|\mathcal D_S| = 500\), \(\lambda = 1\), \(\alpha = 0.5\)
Performance: 9/10 trials
~log(2)
~50%
Potential solution?
1. Minimize sim2real gap
2. Embrace the sim2real gap
Sim data provides more information about \(p_A(a)\)
We can explicitly leverage this prior using classifier free guidance
Conditional Score Estimate
Unconditional Score Estimate
Helps guide the diffusion process
Conditional Score Estimate
Difference in conditional and unconditional scores
Immediate next steps:
Guiding Questions: