Bayesian Optimization for Robotics
Roberto Calandra
Secondmind - 14 Oct 2021
Facebook AI Research
Goals of the talk
- Explain some of the challenges in Robotics
-
Present multiple successful applications of BO in Robotics:
- Learning to walk with a bipedal robot
- Multi-objective BO for navigation with micro-robots
- Hierarchical BO for joint morphology/controller optimization
- High-dimensional BO with linear embeddings (done right)
- Argue why BO is a powerful tool for Robotics
State-of-the-art in Robotics
From YouTube: https://www.youtube.com/watch?v=g0TaYhjpOfo
Why Learning?
Robotics still heavily rely on human expertise !
On one hand, it is unfeasible to hand-design general purpose controllers
- Human design is time-consuming and rely on prior expertise
- Real-world experiments are expensive and stochastic
On the other hand, there is mistrust for automatic design of controllers
- Not verifiable
- Often find qualitatively different solutions
- (Maybe a bit of human presumption)
Black-box Optimization
Optimized parameters
Objective function
Parameters to optimize
Bayesian Optimization for Policy Search
Policy (i.e., parametrized controller)
Action executed
Learning a controller is equivalent to optimizing the parameters of the controller
Current state
Parameters of the policy
- 0-order
- Stochastic
- Expensive evaluation
The Beginning
Learning to Walk with a Bipedal Robot


Bio-inspired Bipedal Robot "Fox":
- Quasi-passive dynamic walker
- 4 Degrees of freedom
- Springs in legs
- Walking in circle
- Finite-state-machine controller (from biomechanics)
- 8 open parameters
- (Motors life ~200 trials)
Calandra, R.; Seyfarth, A.; Peters, J. & Deisenroth, M. P.
Bayesian Optimization for Learning Gaits under Uncertainty
Annals of Mathematics and Artificial Intelligence (AMAI), 2015, 76, 5-23
Learning to Walk in 80 Trials
Learning Curve

Calandra, R.; Seyfarth, A.; Peters, J. & Deisenroth, M. P.
Bayesian Optimization for Learning Gaits under Uncertainty
Annals of Mathematics and Artificial Intelligence (AMAI), 2015, 76, 5-23
Comparison

Calandra, R.; Seyfarth, A.; Peters, J. & Deisenroth, M. P.
Bayesian Optimization for Learning Gaits under Uncertainty
Annals of Mathematics and Artificial Intelligence (AMAI), 2015, 76, 5-23
Learned model


Not Symmetrical (about 5° difference). Why?
Because it is walking in a circle!
Calandra, R.; Seyfarth, A.; Peters, J. & Deisenroth, M. P.
Bayesian Optimization for Learning Gaits under Uncertainty
Annals of Mathematics and Artificial Intelligence (AMAI), 2015, 76, 5-23
Beyond Single Objective
Locomotion as Multi-objective Optimization

Trade-off between Walking Speed and Energy Consumption!
Multi-objective Optimization
- Most engineering problems are truly multi-objective

Pareto Front
- Not all objective functions can be optimized at once
- Solving this optimization means finding the
- PF identifies should be:
- Complete
- Dense
- Accurate
Predicting Pareto Front



20 Evaluations
50 Evaluations
200 Evaluations
Calandra, R.; Peters, J. & Deisenroth, M. P.
Pareto Front Modeling for Sensitivity Analysis in Multi-Objective Bayesian Optimization
NIPS Workshop on Bayesian Optimization (BayesOpt), 2014
Predicting Pareto Front


MOP2
ZDT3
Predicting Pareto Front (Noisy)


Calandra, R.; Peters, J. & Deisenroth, M. P.
Pareto Front Modeling for Sensitivity Analysis in Multi-Objective Bayesian Optimization
NIPS Workshop on Bayesian Optimization (BayesOpt), 2014


MOP2
ZDT3
Predicting Pareto Front (Noisy)
Sensitivity Analysis


Sensitivity Analysis (MOP2)
Calandra, R.; Peters, J. & Deisenroth, M. P.
Pareto Front Modeling for Sensitivity Analysis in Multi-Objective Bayesian Optimization
NIPS Workshop on Bayesian Optimization (BayesOpt), 2014


Sensitivity Analysis (RMTP3)


Learning to Walk with Micro-robots

Micro-robots

Simulated hexapod:
- 12 Degrees of Freedom (2 per legs)
- No good physics models at that scale
- Central Pattern Generators (CPG) as controller
Let's apply all the tools we have so far!
Yang, B.; Wang, G.; Calandra, R.; Contreras, D.; Levine, S. & Pister, K.
Learning Flexible and Reusable Locomotion Primitives for a Microrobot
IEEE Robotics and Automation Letters (RA-L), 2018, 3, 1904-1911

Hard-coded CPG Gaits

Yang, B.; Wang, G.; Calandra, R.; Contreras, D.; Levine, S. & Pister, K.
Learning Flexible and Reusable Locomotion Primitives for a Microrobot
IEEE Robotics and Automation Letters (RA-L), 2018, 3, 1904-1911
Single-objective




Yang, B.; Wang, G.; Calandra, R.; Contreras, D.; Levine, S. & Pister, K.
Learning Flexible and Reusable Locomotion Primitives for a Microrobot
IEEE Robotics and Automation Letters (RA-L), 2018, 3, 1904-1911
Dual Tripod Gait
Multi-objective




Yang, B.; Wang, G.; Calandra, R.; Contreras, D.; Levine, S. & Pister, K.
Learning Flexible and Reusable Locomotion Primitives for a Microrobot
IEEE Robotics and Automation Letters (RA-L), 2018, 3, 1904-1911
Comparison Gaits

Yang, B.; Wang, G.; Calandra, R.; Contreras, D.; Levine, S. & Pister, K.
Learning Flexible and Reusable Locomotion Primitives for a Microrobot
IEEE Robotics and Automation Letters (RA-L), 2018, 3, 1904-1911
Discovering New Gaits

Contextual Bayesian Optimization
Optimized parameters
Objective function
Parameters to optimize
Context
Contextual BO

Yang, B.; Wang, G.; Calandra, R.; Contreras, D.; Levine, S. & Pister, K.
Learning Flexible and Reusable Locomotion Primitives for a Microrobot
IEEE Robotics and Automation Letters (RA-L), 2018, 3, 1904-1911
Learning Locomotion Primitives


- With 50 trials for each of the 5 goal targets, we can learn fairly accurate model
- The trick was to consider it a contextual BO at training time, and then convert to MOO
Combining Primitives for Navigation
(More) Expensive Optimization
Joint Morphology/Controller Optimization
- In Robotics, there is a tight relationship between morphologies and controllers
- Design of morphologies is a complex and time-consuming process
- Can we automate it?
- Same simulated hexapod as before:
- Each manufacturing round takes about 1 month in real-world...
- ...But we can fabricate multiple different morphology configurations at once (up to 5)
Liao, T.; Wang, G.; Yang, B.; Lee, R.; Pister, K.; Levine, S. & Calandra, R.
Data-efficient Learning of Morphology and Controller for a Microrobot
IEEE International Conference on Robotics and Automation (ICRA), 2019


Hierarchical Process Constrained Batch Bayesian Optimization (HPC-BBO)
Two levels of optimization
(instead of a single bigger optimization)
- Allows to weight the different cost of the two types of parameters
- Each of the two levels uses information from the other level:
- The morphology level consider the best policy achieved for each morphology design
- The controller level uses the morphology as context
- Batch evaluation to reduce fabrication time

Liao, T.; Wang, G.; Yang, B.; Lee, R.; Pister, K.; Levine, S. & Calandra, R.
Data-efficient Learning of Morphology and Controller for a Microrobot
IEEE International Conference on Robotics and Automation (ICRA), 2019
Results



Liao, T.; Wang, G.; Yang, B.; Lee, R.; Pister, K.; Levine, S. & Calandra, R.
Data-efficient Learning of Morphology and Controller for a Microrobot
IEEE International Conference on Robotics and Automation (ICRA), 2019
Top 4 Morphologies


- Exchanging the morphology severely degrade the controller performance.
- This evidence supports the hypothesis that morphology and controller need to be tightly coupled
Brief Interlude
(BO for Model-Based Reinforcement Learning)
Model-based Reinforcement Learning

Is Something Strange about MBRL?


How to Use the Reward?

Goal-Driven Dynamics Learning
- Instead of optimizing the forward dynamics w.r.t. the NLL of the next state, we optimize w.r.t. the reward
(The reward is all we care about)
- Computing the gradients analytically is intractable
- We used a zero-order optimizer: Bayesian optimization
- (and an LQG framework)
Bansal, S.; Calandra, R.; Xiao, T.; Levine, S. & Tomlin, C. J.
Goal-Driven Dynamics Learning via Bayesian Optimization
IEEE Conference on Decision and Control (CDC), 2017, 5168-5173
Real-world Quadcopter


Bansal, S.; Calandra, R.; Xiao, T.; Levine, S. & Tomlin, C. J.
Goal-Driven Dynamics Learning via Bayesian Optimization
IEEE Conference on Decision and Control (CDC), 2017, 5168-5173
Dubins Car


Bansal, S.; Calandra, R.; Xiao, T.; Levine, S. & Tomlin, C. J.
Goal-Driven Dynamics Learning via Bayesian Optimization
IEEE Conference on Decision and Control (CDC), 2017, 5168-5173
Conclusion
There exist models that are wrong, but nearly optimal when used for control
- From a Sys.ID perspective, they are completely wrong
- These models might be out-of-class (e.g., linear model for non-linear dynamics)
- Hyphothesis: these models capture some structure of the optimal solution, ignoring the rest of the space
- Evidence: these models do not seem to generalize to new tasks
Understand and Overcome the Limitations of MBRL
- Can we avoid the multiplicative error of recursive one-step predictions?
Lambert, N.; Wilcox, A.; Zhang, H.; Pister, K. S. J. & Calandra, R.
Learning Accurate Long-term Dynamics for Model-based Reinforcement Learning
IEEE Conference on Decision and Control (CDC), 2021
(YES)
- Can we dynamically tune MBRL hyperparameters?
Zhang, B.; Rajan, R.; Pineda, L.; Lambert, N.; Biedenkapp, A.; Chua, K.; Hutter, F. & Calandra, R.
On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning
International Conference on Artificial Intelligence and Statistics (AISTATS), 2021
(YES)
- Are accurate models condition necessary for good control performance?
- Are accurate models condition sufficient for good control performance?
Bansal, S.; Calandra, R.; Xiao, T.; Levine, S. & Tomlin, C. J.
Goal-Driven Dynamics Learning via Bayesian Optimization
IEEE Conference on Decision and Control (CDC), 2017, 5168-5173
(NO)
(NO)
Lambert, N.; Amos, B.; Yadan, O. & Calandra, R.
Objective Mismatch in Model-based Reinforcement Learning
Learning for Dynamics and Control (L4DC), 2020, 761-770
At Last.
Revisiting Linear Embeddings
High-dimensional BO with Linear Embeddings
Z. Wang, F. Hutter, M. Zoghi, D. Matheson, and N. de Freitas.
Bayesian optimization in a billion dimensions via random embeddings.
Journal of Artificial Intelligence Research, 55:361–387, 2016
Very neat Idea!
But several wrong assumptions...
A Few fixes
Letham, B.; Calandra, R.; Rai, A. & Bakshy, E.
Re-Examining Linear Embeddings for High-dimensional Bayesian Optimization
Advances in Neural Information Processing Systems (NeurIPS), 2020
- Linear projections do not preserve product kernels.
- Mahalanobis Kernel
- Most points in the embedding map to the facets of the projection
- Constrain the embedding optimization to points within the bounds
- Linear embeddings can have a low probability of containing an optimum.
- Unit hypersphere sampling for the projection
Results

Letham, B.; Calandra, R.; Rai, A. & Bakshy, E.
Re-Examining Linear Embeddings for High-dimensional Bayesian Optimization
Advances in Neural Information Processing Systems (NeurIPS), 2020
Collaborators











and more...

Summary
- Gave a glimpse into some challenges of Robotics
-
Shown several successful application of BO in Robotics:
- Learning to walk with the bipedal robot "Fox"
- Multi-objective BO for navigation with micro-robots
- Hierarchical BO for joint morphology/controller optimization
- BO is a powerful tool in the toolbox of any robot learning researcher
- Learned models provide useful insight!

Thank you for your time


References
- Calandra, R.; Seyfarth, A.; Peters, J. & Deisenroth, M. P.
Bayesian Optimization for Learning Gaits under Uncertainty
Annals of Mathematics and Artificial Intelligence (AMAI), 2015, 76, 5-23 - Yi, Z.; Calandra, R.; Veiga, F. F.; van Hoof, H.; Hermans, T.; Zhang, Y. & Peters, J.
Active Tactile Object Exploration with Gaussian Processes
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2016, 4925-4930 - Calandra, R.; Peters, J. & Deisenroth, M. P.
Pareto Front Modeling for Sensitivity Analysis in Multi-Objective Bayesian Optimization
NIPS Workshop on Bayesian Optimization (BayesOpt), 2014 - Bansal, S.; Calandra, R.; Xiao, T.; Levine, S. & Tomlin, C. J.
Goal-Driven Dynamics Learning via Bayesian Optimization
IEEE Conference on Decision and Control (CDC), 2017, 5168-5173 - Liao, T.; Wang, G.; Yang, B.; Lee, R.; Pister, K.; Levine, S. & Calandra, R.
Data-efficient Learning of Morphology and Controller for a Microrobot
IEEE International Conference on Robotics and Automation (ICRA), 2019, 2488-2494 - Yang, B.; Wang, G.; Calandra, R.; Contreras, D.; Levine, S. & Pister, K.
Learning Flexible and Reusable Locomotion Primitives for a Microrobot
IEEE Robotics and Automation Letters (RA-L), 2018, 3, 1904-1911 - Letham, B.; Calandra, R.; Rai, A. & Bakshy, E.
Re-Examining Linear Embeddings for High-dimensional Bayesian Optimization
Advances in Neural Information Processing Systems (NeurIPS), 2020
Bayesian Optimization for Robotics
By Roberto Calandra
Bayesian Optimization for Robotics
Designing and tuning controllers for real-world robots is a daunting task which typically requires significant expertise and lengthy experimentation. Bayesian optimization has shown to be a successful approach to automate these tasks with little human expertise required. In this talk, I will discuss the main challenges of robot learning, and how BO helps to overcome some of them. Using as showcase real-world applications where BO proved to be effective, I will also discuss how the challenges encountered in robotics applications can guide the development of new BO algorithms.
- 799