Oct 25, 2024
Adam Wei
🎉
🎉
PhD
❌
Start Up
✅
Screw the PhD...
Mr. Suh sounded better anyways...
Sim Infrastructure
Sim Data Generation
Octo
DROID
Similar comments in OpenX, OpenVLA, etc...
... will make more concrete in a few slides
Robot Actions
O
A
Robot Actions
A
O
Ex: From a human demonstrator
Ex: From GCS
Robot Actions
O
A
Robot Actions
A
O
Cotraining: Use both datasets to train a model that maximizes some test objective
R, S => real and sim
\(|\mathcal D|\) = # demos in \(\mathcal D\)
\(\mathcal{L}=\mathbb{E}_{p_{O,A},k,\epsilon^k}[\lVert \epsilon^k - \epsilon(O_t, A^o_t + \epsilon^k, k) \rVert^2] \)
\(\mathcal{L}_{\mathcal{D}}=\mathbb{E}_{\mathcal{D},k,\epsilon^k}[\lVert \epsilon^k - \epsilon(O_t, A^o_t + \epsilon^k, k) \rVert^2] \approx \mathcal L \)
\(\mathcal D^\alpha\) Dataset mixture
Informs the qualities we want in sim
(Tower property of expectations)
Instead:
For vanilla cotraining: how do \(|\mathcal D_R|\), \(|\mathcal D_S|\), and \(\alpha\) affect the policy's success rate?
\(|\mathcal D_R|\) = 10, 50, 150 demos
\(|\mathcal D_S|\) = 500, 2000 demos
\(\alpha = 0,\ \frac{|\mathcal{D}_{R}|}{|\mathcal{D}_{R}|+|\mathcal{D}_{S}|},\ 0.25,\ 0.5,\ 0.75,\ 1\)
Sweep the following parameters:
For vanilla cotraining: how do \(|\mathcal D_R|\), \(|\mathcal D_S|\), and \(\alpha\) affect the policy's success rate?
\(|\mathcal D_R| = 10\)
\(|\mathcal D_R| = 50\)
\(|\mathcal D_R| = 150\)
cyan: \(|\mathcal D_S| = 500\) orange: \(|\mathcal D_S| = 2000\) red: real only
\(|\mathcal D_R| = 10\)
\(|\mathcal D_R| = 50\)
\(|\mathcal D_R| = 150\)
cyan: \(|\mathcal D_S| = 500\) orange: \(|\mathcal D_S| = 2000\) red: real only
Initial experiments suggest that scaling up sim is a good idea!
... to be verified at larger scales with sim-sim experiments
\(|\mathcal D_R|\)
\(|\mathcal D_S|\)
Experiments match intuition and theoretical bounds
Sim Demos
Real Rollout
see other tab
Cotrained policies exhibit similar characteristics to the real-world expert regardless of the mixing ratio
Real Data
Sim Data
How does cotraining improve performance?
Probably some combination of the above.
When deployed in real, cotrained policies rely on real data and use sim data to fill in the gap
Policy | Observation embeddings acc. | Final activation acc. |
---|---|---|
50/500, alpha = 0.75 | 100% | 74.2% |
50/2000, alpha = 0.75 | 100% | 89% |
10/2000, alpha = 0.75 | 100% | 93% |
10/2000, alpha = 5e-3 | 100% | 94% |
10/500, alpha = 0.02 | 100% | 88% |
Only a small amount of real data is needed to separate the embeddings and activations
Real Rollout
Sim Rollout
Mostly red with blue interleaved
Mostly blue
Assumption: if kNN are real/sim, this behavior was learned from real/sim
Similar results for kNN on embeddings
Real Rollout
Sim Rollout
Some red
All blue
Sim data prevents overfitting and acts as a regularizor
When \(|\mathcal D_R|\) is small, \(\mathcal L_{\mathcal D_R}\not\approx\mathcal L\).
\(\mathcal D_S\) helps regularize and prevent overfitting
Sim data provides more information about \(p_A(a)\)
Can we test/leverage this hypothesis with classifier-free guidance?
... more on this later if time allows
Sim data provides more information about \(p_A(a)\)
Can we leverage this with classifier free guidance...
}
Informs the qualities we want in sim
Setting up a simulator for data generation is non-trivial
+ Actions
What qualities matter in the simulated data?
Visual Shifts
Physics Shifts
Task Shifts
Original
Color Shift
Goal Shift
Object Shift
** Except object shift
Domain Randomization
Color Shift
Center of Mass Shift
Goal Shift
Object Shift
Fact: cotrained models can distinguish sim & real
Thought experiment: what if sim and real were nearly indistinguishable?
Thought experiment: what if sim and real were nearly indistinguishable?
\(\mathrm{dist}(\mathcal D_R,\mathcal D_S)\) small \(\implies p^{real}_{(O,A)} \approx p^{sim}_{(O,A)} \implies p^{real}_O \approx p^{sim}_O\)
'visual sim2real gap'
Current approaches to sim2real: make sim and real visually indistinguishable
\(p^{R}_O \approx p^{S}_O\)
Do we really need this?
\(p^{R}_O \approx p^{S}_O\)
\(a^k\)
\(\hat \epsilon^k\)
\(o \sim p_O\)
\(o^{emb} = f_\psi(o)\)
\(p^{R}_O \approx p^{S}_O\)
\(p^{R}_{emb} \approx p^{S}_{emb}\)
\(\implies\)
Weaker requirement
\(\epsilon_\theta\)
\(f_\psi\)
\(d_\phi\)
\(\hat{\mathbb P}(f_\psi(o)\ \mathrm{is\ sim})\)
\(\epsilon^k\)
\(a^k\)
o
Denoiser Loss
Negative BCE Loss
\(\iff\)
(Variational characterization of f-divergences)
thanks Yury! :D
Common features (sim & real)
Distinguishable features*
Relavent for controls...
* also known as protected variables in AI safety literature
Hypothesis:
\(|\mathcal D_R| = 50\), \(|\mathcal D_S| = 500\), \(\lambda = 1\), \(\alpha = 0.5\)
Performance: 9/10 trials
~log(2)
~50%
Potential solution?
1. Minimize sim2real gap
2. Embrace the sim2real gap
Sim data provides more information about \(p_A(a)\)
We can explicitly leverage this prior using classifier free guidance
Conditional Score Estimate
Unconditional Score Estimate
Helps guide the diffusion process
Conditional Score Estimate
Difference in conditional and unconditional scores
Immediate next steps:
Future work: