Sim2Real: Simulations for data synthesis, augmentation, and robustness
Harshavardhan Kamarthi
Data Seminar
Reason for success
- Larger models
- Faster and efficient compute devices
- Data
What if less data? Simulations
- We can collect as many samples as possible
- It's safe to deploy and test
- We can be more creative
- Cost effective
Sim 2 Real
- Train on Simulation
- Transfer the neural network to real world
- Need lesser (or no) real world data to fine-tune/deploy
Domain Randomization
- Why?
- Hard to exactly capture real world dynamics
- Learn from diverse dataset that is robust to real-world fluctuations
- How?
- Expert Knowledge
- Self-adapting
Sim-to-Real Transfer of Robotic Control with Dynamics Randomization (ICRA 2018)
Task: Simple Object Manipulation by Robotic Arm
- Train on simulator with variety of domain dynamics
- Directly deploy in real world
Sim-to-Real Transfer of Robotic Control with Dynamics Randomization (ICRA 2018)
Task: Simple Object Manipulation by Robotic Arm
Trained on variety of simulations with parameters sampled uniformly from range given above
Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World
Task: Simple Object Manipulation by Robotic Arm using image data
Task: Train a object detector to map camera image to correct coordinates w.r.t robot's camera
Randomization parameters:
- Shape and # of objects
- Position of object, camera
- Lighting
- Random noise to image
- Pre-train on images with randomized settings
- Fine-tune in real world
Data Dreaming for Object Detection: Learning Object-Centric State Representations for Visual Imitation
Task: Train a object detector with automatic data augmentation for behaviour cloning, pose estimation
If the object detector successfully detects the object we can generate more data by changing the position of object, background of image and train on it.
EPOpt: Learning Robust Neural Network Policies Using Model Ensembles
Task: General learning algorithm for ensuring robustness
Idea
- Train on simulations with simulation parameters from sampled from distribution
- Select subset of simulations where the performance is the
percentile worst - Update Neural Network only on the data from these simulations
Domain Adaptation
- What if the distribution of simulation parameters is too general?
- What if it doesn't capture real world settings with a good probability?
EPOpt: Learning Robust Neural Network Policies Using Model Ensembles
- Use Bayesian inference to improve the simulation parameters
Closing the Sim-to-Real Loop:Adapting Simulation Randomization with Real World Experience
- Reduce the distance of real world parameters distribution and simulation parameter distribution
Closing the Sim-to-Real Loop:Adapting Simulation Randomization with Real World Experience
- Reduce the distance of real world parameters distribution and simulation parameter distribution
Domain-Adversarial Training of Neural Networks
Fool the domain classifier by having similar intermediate features from either real world or simulation data
Domain-Adversarial Training of Neural Networks
Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping
Task: Robot grasping task
Idea
- Used synthetic data (ShapeNet and other procedurally generated objects) and build simulation
- Domain Adaptation: DANN and GraspGAN
Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping
DANN
GraspGAN
Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping
Adding other domain randomization such as image noise, different camera pose, brightness also improved success rate by 1%. However, DANN and GraspGAN improved over simple randomization by 4-6%
Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design
Task: Generic RL algorithm for robust transfer
- Two agents: protagonist, antagonist try to better each other. Total Reward for protagonist is reward by protagonist - reward of antagonist. (Vice versa for A)
- Adversary modifies the environment s.t it maximizes the difference in reward between antagonist and protagonist
Minimax on uniform domain randomization does worse than PAIRED
Modularity during transfer
You don't have to fine-tune everything on real-world, just specific modules.
Learning to Drive from Simulation without Real World Labels
Idea
- Map real-world and trained images on latent space using GAN
- Train on simulation with labels and learn a policy
Learning to Drive from Simulation without Real World Labels
Along with training GAN, make sure the latent space is consistent for both domains
Learning to Drive from Simulation without Real World Labels
Driving Policy Transfer via Modularity and Abstraction
Control system for driving can be modular
- Train the entire pipeline on simulation
- Fine tune/ re-train perception module from real-world data. Agent can infer the final control actions from segmentation information.
Blind Spot Detection for Safe Sim-to-Real Transfer
Learn a blindspot detector using human/oracle feedback
- Train the entire pipeline on simulation
- Learn from multiple experts' demonstrations and corrections. Aggregate noisy labels using EM algorithm.
- Learn a classifier to detect if current state is blindspot
Blind Spot Detection for Safe Sim-to-Real Transfer
- Good trade-off between always using oracle and not using oracle
Conclusion
- Sim2Real reduces the need to sample from real world
- We can generalize to unseen real world conditions by training real world policy
- Virtual to real adaptation is still a big challenge and difficulty varies across tasks.
Sim2Real
By Harshavardhan Kamarthi
Sim2Real
- 197