Feb 27, 2025
Adam Wei
Would you prefer the monthly check-ins to be presentation-oriented or discussion-oriented?
Big data
Big transfer gap
Small data
No transfer gap
Ego-Exo
robot teleop
Open-X
simulation
How can we obtain data for imitation learning?
Cotrain from different data sources
(ex. sim & real)
Cotraining: Use both datasets to train a model that maximizes some test objective
Model: Diffusion Policy
Test objective: Success rate on a planar pushing task
(Tower property of expectations)
Vanilla Cotraining:
\(\mathcal D^\alpha\) Dataset mixture
\(|\mathcal D_R| = 10\)
\(|\mathcal D_R| = 50\)
\(|\mathcal D_R| = 150\)
cyan: \(|\mathcal D_S| = 500\) orange: \(|\mathcal D_S| = 2000\) red: real only
10 real demos
2000 sim demos
50 real demos
500 sim demos
Initial experiments suggest that scaling up sim is a good idea!
... to be verified at larger scales with sim-sim experiments
Can we create 2 different simulation environments that replicate the sim2real gap?
If yes, need to emulate...
Sim2Real
Sim2Sim
Simulation
"Real World"
Sim2Real
Sim2Sim
Simulation
"Real World"
Quasi-static dynamics
Quasi-static dynamics
Hydroelastic + Drake Physics
Real World Physics
Real Data Policies
Sim Data Only
Cons
Pros
Setting up a simulator for data generation is non-trivial
+ Actions
What qualities matter in the simulated data?
Domain Rand.
Color Shift
Physics Shift
(CoM y-offset)
Level 1
Level 2
Level 3
+3cm
-3cm
-6cm
Level 4
Level 0
Drake physics
Visual Shifts
Goal Shift
Object Shift
Task Shift
Level 1
Level 2
Level 3
Only 1 level
Level 0
Experimental Setup
** Except object shift
3. Task shift (??) TBD
In sim2sim, we can analyze different types of sim2real gaps independently.
Physics Gap
Visual Gap
Off
On
Off
On
Perfect Simulator
Perfect Physics Engine
Perfect Renderer
Regular Cotraining
Initial experiments suggest that scaling up sim is a good idea!
... to be verified at larger scales with sim-sim experiments
Disclaimer: Preliminary Results
Section 1: Real World Experiments
Section 2: Simulation Experiments
Section 3: Empirical Analysis
Cotrained policies exhibit similar characteristics to the real-world expert regardless of the mixing ratio
Real Data
Sim Data
How does cotraining improve performance?
Probably some combination of the above.
Cotrained policies can identify the sim vs real: sim data improves representation learning and fills in gaps in real data
Policy | Observation embeddings acc. | Final activation acc. |
---|---|---|
50/500, alpha = 0.75 | 100% | 74.2% |
50/2000, alpha = 0.75 | 100% | 89% |
10/2000, alpha = 0.75 | 100% | 93% |
10/2000, alpha = 5e-3 | 100% | 94% |
10/500, alpha = 0.02 | 100% | 88% |
Only a small amount of real data is needed to separate the embeddings and activations
Real Rollout
Sim Rollout
Mostly red with blue interleaved
Mostly blue
Assumption: if kNN are real/sim, this behavior was learned from real/sim
Similar results for kNN on embeddings
Sim data prevents overfitting and acts as a regularizor
When \(|\mathcal D_R|\) is small, \(\mathcal L_{\mathcal D_R}\not\approx\mathcal L\).
\(\mathcal D_S\) helps regularize and prevent overfitting
Sim data provides more information about \(p_A(a)\)
Can we test/leverage this hypothesis with classifier-free guidance?
... more on this later if time allows
Research Questions
Example ideas:
Thought experiment: what if sim and real were nearly indistinguishable?
Fact: cotrained models can distinguish sim & real
Thought experiment: what if sim and real were nearly indistinguishable?
\(\mathrm{dist}(\mathcal D_R,\mathcal D_S)\) small \(\implies p^{real}_{(O,A)} \approx p^{sim}_{(O,A)} \implies p^{real}_O \approx p^{sim}_O\)
'visual sim2real gap'
Current approaches to sim2real: make sim and real visually indistinguishable
\(p^{R}_O \approx p^{S}_O\)
Do we really need this?
\(p^{R}_O \approx p^{S}_O\)
\(a^k\)
\(\hat \epsilon^k\)
\(o \sim p_O\)
\(o^{emb} = f_\psi(o)\)
\(p^{R}_O \approx p^{S}_O\)
\(p^{R}_{emb} \approx p^{S}_{emb}\)
\(\implies\)
Weaker requirement
\(\epsilon_\theta\)
\(f_\psi\)
\(d_\phi\)
\(\hat{\mathbb P}(f_\psi(o)\ \mathrm{is\ sim})\)
\(\epsilon^k\)
\(a^k\)
o
Denoiser Loss
Negative BCE Loss
\(\iff\)
(Variational characterization of f-divergences)
Common features (sim & real)
Distinguishable features*
Relavent for control...
* also known as protected variables in AI fairness literature
\(\epsilon_\theta\)
\(f_\psi\)
\(d_\phi\)
\(\hat{\mathbb P}(f_\psi(o)\ \mathrm{is\ sim})\)
\(\epsilon^k\)
\(a^k\)
o
Denoiser Loss
Negative BCE Loss
\(\iff\)
(Variational characterization of f-divergences)
Hypothesis:
\(|\mathcal D_R| = 50\), \(|\mathcal D_S| = 500\), \(\lambda = 1\), \(\alpha = 0.5\)
Performance: 9/10 trials
~log(2)
~50%
Potential solution?
1. Minimize sim2real gap
2. Embrace the sim2real gap
Sim data provides more information about \(p_A(a)\)
We can explicitly leverage this prior using classifier free guidance
Conditional Score Estimate
Unconditional Score Estimate
Helps guide the diffusion process
Conditional Score Estimate
Difference in conditional and unconditional scores
Immediate next steps:
Guiding Questions: