Sim2Real: Simulations for data synthesis, augmentation, and robustness

Harshavardhan Kamarthi

Data Seminar

Reason for success

  • Larger models
  • Faster and efficient compute devices
  • Data

What if less data? Simulations

  • We can collect as many samples as possible
  • It's safe to deploy and test
    • We can be more creative
  • Cost effective

Sim 2 Real

  • Train on Simulation
  • Transfer the neural network to real world
  • Need lesser (or no) real world data to fine-tune/deploy

Domain Randomization

  • Why?
    • Hard to exactly capture real world dynamics
    • Learn from diverse dataset that is robust to real-world fluctuations
  • How?
    • Expert Knowledge
    • Self-adapting

Sim-to-Real Transfer of Robotic Control with Dynamics Randomization (ICRA 2018)

Task: Simple Object Manipulation by Robotic Arm

  1. Train on simulator with variety of domain dynamics
  2. Directly deploy in real world

Sim-to-Real Transfer of Robotic Control with Dynamics Randomization (ICRA 2018)

Task: Simple Object Manipulation by Robotic Arm

Trained on variety of simulations with parameters sampled uniformly from range given above

Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World

Task: Simple Object Manipulation by Robotic Arm using image data

Task: Train a object detector to map camera image to correct coordinates w.r.t robot's camera

Randomization parameters:

  • Shape and # of objects
  • Position of object, camera
  • Lighting
  • Random noise to image
  • Pre-train on images with randomized settings
  • Fine-tune in real world

Data Dreaming for Object Detection: Learning Object-Centric State Representations for Visual Imitation

Task: Train a object detector with automatic data augmentation for behaviour cloning, pose estimation

If the object detector successfully detects the object we can generate more data by changing the position of object, background of image and train on it.

EPOpt: Learning Robust Neural Network Policies Using Model Ensembles

Task: General learning algorithm for ensuring robustness

Idea

  • Train on simulations with simulation parameters from sampled from distribution
  • Select subset of simulations where the performance is the
        percentile worst
  • Update Neural Network only on the data from these simulations
\epsilon

Domain Adaptation

  • What if the distribution of simulation parameters is too general?
  • What if it doesn't capture real world settings with a good probability?

EPOpt: Learning Robust Neural Network Policies Using Model Ensembles

  • Use Bayesian inference to improve the simulation parameters
P(\phi_i| \tau) \propto \mathcal{L}(\tau|\phi_i) \times P(\phi_i)

Closing the Sim-to-Real Loop:Adapting Simulation Randomization with Real World Experience

  • Reduce the distance of real world parameters distribution and simulation parameter distribution

Closing the Sim-to-Real Loop:Adapting Simulation Randomization with Real World Experience

  • Reduce the distance of real world parameters distribution and simulation parameter distribution

Domain-Adversarial Training of Neural Networks

Fool the domain classifier by having similar intermediate features from either real world or simulation data

Domain-Adversarial Training of Neural Networks

Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping

Task: Robot grasping task

Idea

  1. Used synthetic data (ShapeNet and other procedurally generated objects) and build simulation
  2. Domain Adaptation: DANN and GraspGAN

Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping

DANN

GraspGAN

Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping

Adding other domain randomization such as image noise, different camera pose, brightness also improved success rate by 1%. However, DANN and GraspGAN improved over simple randomization by 4-6%

Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design

Task: Generic RL algorithm for robust transfer

  1. Two agents: protagonist, antagonist try to better each other. Total Reward for protagonist is reward by protagonist - reward of antagonist. (Vice versa for A)
  2. Adversary modifies the environment s.t it maximizes the difference in reward between antagonist and protagonist

Minimax on uniform domain randomization does worse than PAIRED

Modularity during transfer

You don't have to fine-tune everything on real-world, just specific modules.

Learning to Drive from Simulation without Real World Labels

Idea

  1. Map real-world and trained images on latent space using GAN
  2. Train on simulation with labels and learn a policy
Z
Z \rightarrow c

Learning to Drive from Simulation without Real World Labels

X_d^{recon} = G_d(E_d(X_d))
X_d^{cyc} = G_d(E_{d'}(G_{d'}(E_d(X_d))))
Z_d^{recon} = E_{d'}(G_{d'}(Z_d))
C(E_d(X_d)) - C(E_{d'}(G_{d'}(E_d(X_d))))

Along with training GAN, make sure the latent space is consistent for both domains

Learning to Drive from Simulation without Real World Labels

Driving Policy Transfer via Modularity and Abstraction

Control system for driving can be modular

  1. Train the entire pipeline on simulation
  2. Fine tune/ re-train perception module from real-world data. Agent can infer the final control actions from segmentation information.

Blind Spot Detection for Safe Sim-to-Real Transfer

Learn a blindspot detector using human/oracle feedback

  • Train the entire pipeline on simulation
  • Learn from multiple experts' demonstrations and corrections. Aggregate noisy labels using EM algorithm.
  • Learn a classifier to detect if current state is blindspot

Blind Spot Detection for Safe Sim-to-Real Transfer

  • Good trade-off between always using oracle and not using oracle

Conclusion

  • Sim2Real reduces the need to sample from real world
  • We can generalize to unseen real world conditions by training real world policy
  • Virtual to real adaptation is still a big challenge and difficulty varies across tasks.

Sim2Real

By Harshavardhan Kamarthi