RLG Short Talk
Mar 13, 2025
Adam Wei
Policy:
Performance Objective:
Success rate on a planar pushing task
Diffusion Policy [2]
Cotraining: Use both sim and real datasets to train a policy that maximizes some real-world performance objective
50 real demos
50 real demos
2000 sim demos
Success rate: 10/20
Success rate: 18/20
2x
2x
Simulation
Analysis
Key idea: Create two different simulated environments and emulate the sim2real gap.
target sim environment emulates the real-world
Sim provides better eval and control over sim2target gap
sim2real gap
sim2target gap
How do visual, physics, and task shift on performance?
Paradoxically, some visual shift is required for good performance!
Physics and task shift are most impactful for downstream performance.
Real-World Demo
Policy Rollout
(Cotrained)
Simulated Demo
2x
2x
2x
High-performing policies must learn to identify sim vs real
since the physics of each environment requires different actions
\(\implies\)
Sim demo worth 0.83 real demos
Sim demo worth 0.49 real demos
Conclusions
Is there a training formulation that simultaneously:
Intuitively, this objective:
denoiser loss
sim2real loss
Adversarial formulation:
Challenges: Great in theory, but GANs are hard to train...
... this is not the path to happiness in robotics
MMD Formulation:
MMD is differentiable* and does not require a max operation!
Corrupt Data
High Quality Data
Computer Vision
Language
Social media, etc
There exist theoretically sound algorithms for cotraining on both corrupt and high-quality data
Corrupt Data
High Quality Data
Protein Folding
Robotics?
Sim Data
Real Data
+ 5 years
=