IROS Submission & Next Directions

RLG Short Talk

Mar 13, 2025

Adam Wei

Agenda

  1. Subset of new experiments
  2. Next directions for cotraining

Policy:

Performance Objective:

Success rate on a planar pushing task

Diffusion Policy [2]

\mathcal L_{\mathcal D^\alpha} = \alpha \textcolor{red}{\mathcal L_{\mathcal D_R}} + (1-\alpha) \textcolor{blue}{\mathcal L_{\mathcal D_S}}

Sim-and-Real Cotraining

Cotraining: Use both sim and real datasets to train a policy that maximizes some real-world performance objective

Sim-and-Real Cotraining

50 real demos

50 real demos

2000 sim demos

Success rate: 10/20

Success rate: 18/20

2x

2x

  • Cotraining improves policy performance by up to 2-7x
  • Scaling sim data improves performance and reduces sensitivity to \(\alpha\)

SDE Interpretation

Advantages of Simulation

  • Performance gains from scaling sim data plateau; additional real data raises the performance ceiling

New Experiments

Simulation

  1. Scaling sim data
  2. Finetuning
  3. Distribution shifts

Analysis

  1. Sim-and-real discernability
  2. Data coverage
  3. Power laws
  4. Classifier-free guidance

Simulation Experiments

Key idea: Create two different simulated environments and emulate the sim2real gap.

target sim environment emulates the real-world

Sim provides better eval and control over sim2target gap

sim2real gap

sim2target gap

SDE Interpretation

Distribution Shifts

How do visual, physics, and task shift on performance?

Paradoxically, some visual shift is required for good performance!

Physics and task shift are most impactful for downstream performance.

SDE Interpretation

Sim-and-Real Discernability

Real-World Demo

Policy Rollout

(Cotrained)

Simulated Demo

  • Fix orientation, then translation
  • Sticking & sliding contacts
  • Similar to real-world demos
  • Fix orientation and translation simultaneously
  • Sticking contacts only

2x

2x

2x

SDE Interpretation

Sim-and-Real Discernability

SDE Interpretation

Sim-and-Real Discernability

High-performing policies must learn to identify sim vs real

since the physics of each environment requires different actions

SDE Interpretation

Positive Transfer & Power Laws

\(\implies\)

Sim demo worth 0.83 real demos

Sim demo worth 0.49 real demos

SDE Interpretation

Next Directions

  1. Reducing sim2real gap at the representation level
  2. Cotraining from corrupt (sim) data

Reducing sim2real

Conclusions

  1. Reducing sim2real gap improves performance
  2. Preserving sim-and-real discernability is important

Is there a training formulation that simultaneously:

  1. Learns domain invariant representations
  2. Preserves sim-and-real discernability?

Cotraining Formulation

\min_{\theta} \mathcal L_{\mathcal D^\alpha}(\theta) + \lambda \mathrm{dist}(p_\theta^S, p_\theta^R)

Intuitively, this objective:

  1. Minimizes the sim2real gap in the learned representation
  2. Preserves domain-specific features that are relevant for control (i.e. minimizing \(\mathcal L_{\mathcal D^\alpha}\))

denoiser loss

sim2real loss

Adversarial Formulation

\min_{\theta} \mathcal L_{\mathcal D^\alpha}(\theta) + \lambda \mathrm{dist}(p_\theta^S, p_\theta^R)
\begin{align*} \mathrm{dist}(p_\theta^S, p_\theta^R) &= \mathrm{JS}(p_\theta^S, p_\theta^R) \\ &= \max_\phi \mathbb{E}_{p_\theta^S}[\log d_\phi (f_\theta(o))] + \mathbb{E}_{p_\theta^R} [1-\log d_\phi (f_\theta(o))] \\ &= \max_\phi \mathcal L_{\mathrm{BCE}}(\theta, \phi) \end{align*}

Adversarial formulation:

Adversarial Formulation

\min_{\theta} \mathcal L_{\mathcal D^\alpha}(\theta) + \lambda \max_\phi \mathcal L_\mathrm{BCE}(\theta, \phi)

Challenges: Great in theory, but GANs are hard to train...

... this is not the path to happiness in robotics

MMD Formulation

\min_{\theta} \mathcal L_{\mathcal D^\alpha}(\theta) + \lambda \mathrm{dist}(p_\theta^S, p_\theta^R)

MMD Formulation:

\begin{align*} \mathrm{dist}(p_\theta^S, p_\theta^R) =& \mathrm{MMD}(p_\theta^S, p_\theta^R) \\ \end{align*}

MMD is differentiable* and does not require a max operation!

Learning From Corrupt Data

Corrupt Data

High Quality Data

Computer Vision

Language

Social media, etc

There exist theoretically sound algorithms for cotraining on both corrupt and high-quality data

Learning From Corrupt Data

Corrupt Data

High Quality Data

Protein Folding

Robotics?

Sim Data

Real Data

+  5 years

=

Thank you!

Spring 2025 Short Talk

By weiadam

Spring 2025 Short Talk

  • 164