Sim2Real: Simulations for data synthesis, augmentation, and robustness

Harshavardhan Kamarthi

Data Seminar

Reason for success

Larger models
Faster and efficient compute devices
Data

What if less data? Simulations

We can collect as many samples as possible
It's safe to deploy and test
- We can be more creative
Cost effective

Sim 2 Real

Train on Simulation
Transfer the neural network to real world
Need lesser (or no) real world data to fine-tune/deploy

Domain Randomization

Why?
- Hard to exactly capture real world dynamics
- Learn from diverse dataset that is robust to real-world fluctuations
How?
- Expert Knowledge
- Self-adapting

Sim-to-Real Transfer of Robotic Control with Dynamics Randomization (ICRA 2018)

Task: Simple Object Manipulation by Robotic Arm

Train on simulator with variety of domain dynamics
Directly deploy in real world

Sim-to-Real Transfer of Robotic Control with Dynamics Randomization (ICRA 2018)

Task: Simple Object Manipulation by Robotic Arm

Trained on variety of simulations with parameters sampled uniformly from range given above

Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World

Task: Simple Object Manipulation by Robotic Arm using image data

Task: Train a object detector to map camera image to correct coordinates w.r.t robot's camera

Randomization parameters:

Shape and # of objects
Position of object, camera
Lighting
Random noise to image

Pre-train on images with randomized settings
Fine-tune in real world

Data Dreaming for Object Detection: Learning Object-Centric State Representations for Visual Imitation

Task: Train a object detector with automatic data augmentation for behaviour cloning, pose estimation

If the object detector successfully detects the object we can generate more data by changing the position of object, background of image and train on it.

EPOpt: Learning Robust Neural Network Policies Using Model Ensembles

Task: General learning algorithm for ensuring robustness

Idea

Train on simulations with simulation parameters from sampled from distribution
Select subset of simulations where the performance is the
percentile worst
Update Neural Network only on the data from these simulations

\epsilon

Domain Adaptation

What if the distribution of simulation parameters is too general?
What if it doesn't capture real world settings with a good probability?

EPOpt: Learning Robust Neural Network Policies Using Model Ensembles

Use Bayesian inference to improve the simulation parameters

P(\phi_i| \tau) \propto \mathcal{L}(\tau|\phi_i) \times P(\phi_i)

Closing the Sim-to-Real Loop:Adapting Simulation Randomization with Real World Experience

Reduce the distance of real world parameters distribution and simulation parameter distribution

Closing the Sim-to-Real Loop:Adapting Simulation Randomization with Real World Experience

Reduce the distance of real world parameters distribution and simulation parameter distribution

Domain-Adversarial Training of Neural Networks

Fool the domain classifier by having similar intermediate features from either real world or simulation data

Domain-Adversarial Training of Neural Networks

Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping

Task: Robot grasping task

Idea

Used synthetic data (ShapeNet and other procedurally generated objects) and build simulation
Domain Adaptation: DANN and GraspGAN

Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping

DANN

GraspGAN

Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping

Adding other domain randomization such as image noise, different camera pose, brightness also improved success rate by 1%. However, DANN and GraspGAN improved over simple randomization by 4-6%

Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design

Task: Generic RL algorithm for robust transfer

Two agents: protagonist, antagonist try to better each other. Total Reward for protagonist is reward by protagonist - reward of antagonist. (Vice versa for A)
Adversary modifies the environment s.t it maximizes the difference in reward between antagonist and protagonist

Minimax on uniform domain randomization does worse than PAIRED

Modularity during transfer

You don't have to fine-tune everything on real-world, just specific modules.

Learning to Drive from Simulation without Real World Labels

Idea

Map real-world and trained images on latent space using GAN
Train on simulation with labels and learn a policy

Z \rightarrow c

Learning to Drive from Simulation without Real World Labels

X_d^{recon} = G_d(E_d(X_d))

X_d^{cyc} = G_d(E_{d'}(G_{d'}(E_d(X_d))))

Z_d^{recon} = E_{d'}(G_{d'}(Z_d))

C(E_d(X_d)) - C(E_{d'}(G_{d'}(E_d(X_d))))

Along with training GAN, make sure the latent space is consistent for both domains

Learning to Drive from Simulation without Real World Labels

Driving Policy Transfer via Modularity and Abstraction

Control system for driving can be modular

Train the entire pipeline on simulation
Fine tune/ re-train perception module from real-world data. Agent can infer the final control actions from segmentation information.

Blind Spot Detection for Safe Sim-to-Real Transfer

Learn a blindspot detector using human/oracle feedback

Train the entire pipeline on simulation

Learn from multiple experts' demonstrations and corrections. Aggregate noisy labels using EM algorithm.
Learn a classifier to detect if current state is blindspot

Blind Spot Detection for Safe Sim-to-Real Transfer

Good trade-off between always using oracle and not using oracle

Conclusion

Sim2Real reduces the need to sample from real world
We can generalize to unseen real world conditions by training real world policy
Virtual to real adaptation is still a big challenge and difficulty varies across tasks.

Sim2Real: Simulations for data synthesis, augmentation, and robustness

Reason for success

What if less data? Simulations

Sim 2 Real

Domain Randomization

Idea

Domain Adaptation

Domain-Adversarial Training of Neural Networks

Domain-Adversarial Training of Neural Networks

Idea

Modularity during transfer

Idea

Driving Policy Transfer via Modularity and Abstraction

Blind Spot Detection for Safe Sim-to-Real Transfer

Blind Spot Detection for Safe Sim-to-Real Transfer

Conclusion

Sim2Real

More from Harshavardhan Kamarthi