IROS Submission & Next Directions

RLG Short Talk

Mar 13, 2025

Adam Wei

Agenda

Subset of new experiments
Next directions for cotraining

Policy:

Performance Objective:

Success rate on a planar pushing task

Diffusion Policy [2]

\mathcal L_{\mathcal D^\alpha} = \alpha \textcolor{red}{\mathcal L_{\mathcal D_R}} + (1-\alpha) \textcolor{blue}{\mathcal L_{\mathcal D_S}}

Sim-and-Real Cotraining

Cotraining: Use both sim and real datasets to train a policy that maximizes some real-world performance objective

Sim-and-Real Cotraining

50 real demos

2000 sim demos

Success rate: 10/20

Success rate: 18/20

Cotraining improves policy performance by up to 2-7x
Scaling sim data improves performance and reduces sensitivity to \(\alpha\)

SDE Interpretation

Advantages of Simulation

Performance gains from scaling sim data plateau; additional real data raises the performance ceiling

New Experiments

Simulation

Scaling sim data
Finetuning
Distribution shifts

Analysis

Sim-and-real discernability
Data coverage
Power laws
Classifier-free guidance

Simulation Experiments

Key idea: Create two different simulated environments and emulate the sim2real gap.

target sim environment emulates the real-world

Sim provides better eval and control over sim2target gap

sim2real gap

sim2target gap

SDE Interpretation

Distribution Shifts

How do visual, physics, and task shift on performance?

Paradoxically, some visual shift is required for good performance!

Physics and task shift are most impactful for downstream performance.

SDE Interpretation

Sim-and-Real Discernability

Real-World Demo

Policy Rollout

(Cotrained)

Simulated Demo

Fix orientation, then translation
Sticking & sliding contacts

Similar to real-world demos

Fix orientation and translation simultaneously
Sticking contacts only

SDE Interpretation

Sim-and-Real Discernability

SDE Interpretation

Sim-and-Real Discernability

High-performing policies must learn to identify sim vs real

since the physics of each environment requires different actions

SDE Interpretation

Positive Transfer & Power Laws

\(\implies\)

Sim demo worth 0.83 real demos

Sim demo worth 0.49 real demos

SDE Interpretation

Next Directions

Reducing sim2real gap at the representation level
Cotraining from corrupt (sim) data

Reducing sim2real

Conclusions

Reducing sim2real gap improves performance
Preserving sim-and-real discernability is important

Is there a training formulation that simultaneously:

Learns domain invariant representations
Preserves sim-and-real discernability?

Cotraining Formulation

\min_{\theta} \mathcal L_{\mathcal D^\alpha}(\theta) + \lambda \mathrm{dist}(p_\theta^S, p_\theta^R)

Intuitively, this objective:

Minimizes the sim2real gap in the learned representation
Preserves domain-specific features that are relevant for control (i.e. minimizing \(\mathcal L_{\mathcal D^\alpha}\))

denoiser loss

sim2real loss

Adversarial Formulation

\min_{\theta} \mathcal L_{\mathcal D^\alpha}(\theta) + \lambda \mathrm{dist}(p_\theta^S, p_\theta^R)

\begin{align*} \mathrm{dist}(p_\theta^S, p_\theta^R) &= \mathrm{JS}(p_\theta^S, p_\theta^R) \\ &= \max_\phi \mathbb{E}_{p_\theta^S}[\log d_\phi (f_\theta(o))] + \mathbb{E}_{p_\theta^R} [1-\log d_\phi (f_\theta(o))] \\ &= \max_\phi \mathcal L_{\mathrm{BCE}}(\theta, \phi) \end{align*}

Adversarial formulation:

Adversarial Formulation

\min_{\theta} \mathcal L_{\mathcal D^\alpha}(\theta) + \lambda \max_\phi \mathcal L_\mathrm{BCE}(\theta, \phi)

Challenges: Great in theory, but GANs are hard to train...

... this is not the path to happiness in robotics

MMD Formulation

\min_{\theta} \mathcal L_{\mathcal D^\alpha}(\theta) + \lambda \mathrm{dist}(p_\theta^S, p_\theta^R)

MMD Formulation:

\begin{align*} \mathrm{dist}(p_\theta^S, p_\theta^R) =& \mathrm{MMD}(p_\theta^S, p_\theta^R) \\ \end{align*}

MMD is differentiable* and does not require a max operation!

Learning From Corrupt Data

Corrupt Data

High Quality Data

Computer Vision

Language

Social media, etc

There exist theoretically sound algorithms for cotraining on both corrupt and high-quality data

Learning From Corrupt Data

Corrupt Data

High Quality Data

Protein Folding

Robotics?

Sim Data

Real Data

+ 5 years

IROS Submission & Next Directions

Agenda

Sim-and-Real Cotraining

Sim-and-Real Cotraining

SDE Interpretation

Advantages of Simulation

New Experiments

Simulation Experiments

SDE Interpretation

Distribution Shifts

SDE Interpretation

Sim-and-Real Discernability

SDE Interpretation

Sim-and-Real Discernability

SDE Interpretation

Sim-and-Real Discernability

SDE Interpretation

Positive Transfer & Power Laws

SDE Interpretation

Next Directions

Reducing sim2real

Cotraining Formulation

Adversarial Formulation

Adversarial Formulation

MMD Formulation

Learning From Corrupt Data

Learning From Corrupt Data

Thank you!