Bayesian Optimization for Robotics
Roberto Calandra
Secondmind  14 Oct 2021
Facebook AI Research
Goals of the talk
 Explain some of the challenges in Robotics

Present multiple successful applications of BO in Robotics:
 Learning to walk with a bipedal robot
 Multiobjective BO for navigation with microrobots
 Hierarchical BO for joint morphology/controller optimization
 Highdimensional BO with linear embeddings (done right)
 Argue why BO is a powerful tool for Robotics
Stateoftheart in Robotics
From YouTube: https://www.youtube.com/watch?v=g0TaYhjpOfo
Why Learning?
Robotics still heavily rely on human expertise !
On one hand, it is unfeasible to handdesign general purpose controllers
 Human design is timeconsuming and rely on prior expertise
 Realworld experiments are expensive and stochastic
On the other hand, there is mistrust for automatic design of controllers
 Not verifiable
 Often find qualitatively different solutions
 (Maybe a bit of human presumption)
Blackbox Optimization
Optimized parameters
Objective function
Parameters to optimize
Bayesian Optimization for Policy Search
Policy (i.e., parametrized controller)
Action executed
Learning a controller is equivalent to optimizing the parameters of the controller
Current state
Parameters of the policy
 0order
 Stochastic
 Expensive evaluation
The Beginning
Learning to Walk with a Bipedal Robot
Bioinspired Bipedal Robot "Fox":
 Quasipassive dynamic walker
 4 Degrees of freedom
 Springs in legs
 Walking in circle
 Finitestatemachine controller (from biomechanics)
 8 open parameters
 (Motors life ~200 trials)
Calandra, R.; Seyfarth, A.; Peters, J. & Deisenroth, M. P.
Bayesian Optimization for Learning Gaits under Uncertainty
Annals of Mathematics and Artificial Intelligence (AMAI), 2015, 76, 523
Learning to Walk in 80 Trials
Learning Curve
Calandra, R.; Seyfarth, A.; Peters, J. & Deisenroth, M. P.
Bayesian Optimization for Learning Gaits under Uncertainty
Annals of Mathematics and Artificial Intelligence (AMAI), 2015, 76, 523
Comparison
Calandra, R.; Seyfarth, A.; Peters, J. & Deisenroth, M. P.
Bayesian Optimization for Learning Gaits under Uncertainty
Annals of Mathematics and Artificial Intelligence (AMAI), 2015, 76, 523
Learned model
Not Symmetrical (about 5° difference). Why?
Because it is walking in a circle!
Calandra, R.; Seyfarth, A.; Peters, J. & Deisenroth, M. P.
Bayesian Optimization for Learning Gaits under Uncertainty
Annals of Mathematics and Artificial Intelligence (AMAI), 2015, 76, 523
Beyond Single Objective
Locomotion as Multiobjective Optimization
Tradeoff between Walking Speed and Energy Consumption!
Multiobjective Optimization
 Most engineering problems are truly multiobjective
Pareto Front
 Not all objective functions can be optimized at once
 Solving this optimization means finding the
 PF identifies should be:
 Complete
 Dense
 Accurate
Predicting Pareto Front
20 Evaluations
50 Evaluations
200 Evaluations
Calandra, R.; Peters, J. & Deisenroth, M. P.
Pareto Front Modeling for Sensitivity Analysis in MultiObjective Bayesian Optimization
NIPS Workshop on Bayesian Optimization (BayesOpt), 2014
Predicting Pareto Front
MOP2
ZDT3
Predicting Pareto Front (Noisy)
Calandra, R.; Peters, J. & Deisenroth, M. P.
Pareto Front Modeling for Sensitivity Analysis in MultiObjective Bayesian Optimization
NIPS Workshop on Bayesian Optimization (BayesOpt), 2014
MOP2
ZDT3
Predicting Pareto Front (Noisy)
Sensitivity Analysis
Sensitivity Analysis (MOP2)
Calandra, R.; Peters, J. & Deisenroth, M. P.
Pareto Front Modeling for Sensitivity Analysis in MultiObjective Bayesian Optimization
NIPS Workshop on Bayesian Optimization (BayesOpt), 2014
Sensitivity Analysis (RMTP3)
Learning to Walk with Microrobots
Microrobots
Simulated hexapod:
 12 Degrees of Freedom (2 per legs)
 No good physics models at that scale
 Central Pattern Generators (CPG) as controller
Let's apply all the tools we have so far!
Yang, B.; Wang, G.; Calandra, R.; Contreras, D.; Levine, S. & Pister, K.
Learning Flexible and Reusable Locomotion Primitives for a Microrobot
IEEE Robotics and Automation Letters (RAL), 2018, 3, 19041911
Hardcoded CPG Gaits
Yang, B.; Wang, G.; Calandra, R.; Contreras, D.; Levine, S. & Pister, K.
Learning Flexible and Reusable Locomotion Primitives for a Microrobot
IEEE Robotics and Automation Letters (RAL), 2018, 3, 19041911
Singleobjective
Yang, B.; Wang, G.; Calandra, R.; Contreras, D.; Levine, S. & Pister, K.
Learning Flexible and Reusable Locomotion Primitives for a Microrobot
IEEE Robotics and Automation Letters (RAL), 2018, 3, 19041911
Dual Tripod Gait
Multiobjective
Yang, B.; Wang, G.; Calandra, R.; Contreras, D.; Levine, S. & Pister, K.
Learning Flexible and Reusable Locomotion Primitives for a Microrobot
IEEE Robotics and Automation Letters (RAL), 2018, 3, 19041911
Comparison Gaits
Yang, B.; Wang, G.; Calandra, R.; Contreras, D.; Levine, S. & Pister, K.
Learning Flexible and Reusable Locomotion Primitives for a Microrobot
IEEE Robotics and Automation Letters (RAL), 2018, 3, 19041911
Discovering New Gaits
Contextual Bayesian Optimization
Optimized parameters
Objective function
Parameters to optimize
Context
Contextual BO
Yang, B.; Wang, G.; Calandra, R.; Contreras, D.; Levine, S. & Pister, K.
Learning Flexible and Reusable Locomotion Primitives for a Microrobot
IEEE Robotics and Automation Letters (RAL), 2018, 3, 19041911
Learning Locomotion Primitives
 With 50 trials for each of the 5 goal targets, we can learn fairly accurate model
 The trick was to consider it a contextual BO at training time, and then convert to MOO
Combining Primitives for Navigation
(More) Expensive Optimization
Joint Morphology/Controller Optimization
 In Robotics, there is a tight relationship between morphologies and controllers
 Design of morphologies is a complex and timeconsuming process
 Can we automate it?
 Same simulated hexapod as before:
 Each manufacturing round takes about 1 month in realworld...
 ...But we can fabricate multiple different morphology configurations at once (up to 5)
Liao, T.; Wang, G.; Yang, B.; Lee, R.; Pister, K.; Levine, S. & Calandra, R.
Dataefficient Learning of Morphology and Controller for a Microrobot
IEEE International Conference on Robotics and Automation (ICRA), 2019
Hierarchical Process Constrained Batch Bayesian Optimization (HPCBBO)
Two levels of optimization
(instead of a single bigger optimization)
 Allows to weight the different cost of the two types of parameters
 Each of the two levels uses information from the other level:
 The morphology level consider the best policy achieved for each morphology design
 The controller level uses the morphology as context
 Batch evaluation to reduce fabrication time
Liao, T.; Wang, G.; Yang, B.; Lee, R.; Pister, K.; Levine, S. & Calandra, R.
Dataefficient Learning of Morphology and Controller for a Microrobot
IEEE International Conference on Robotics and Automation (ICRA), 2019
Results
Liao, T.; Wang, G.; Yang, B.; Lee, R.; Pister, K.; Levine, S. & Calandra, R.
Dataefficient Learning of Morphology and Controller for a Microrobot
IEEE International Conference on Robotics and Automation (ICRA), 2019
Top 4 Morphologies
 Exchanging the morphology severely degrade the controller performance.
 This evidence supports the hypothesis that morphology and controller need to be tightly coupled
Brief Interlude
(BO for ModelBased Reinforcement Learning)
Modelbased Reinforcement Learning
Is Something Strange about MBRL?
How to Use the Reward?
GoalDriven Dynamics Learning
 Instead of optimizing the forward dynamics w.r.t. the NLL of the next state, we optimize w.r.t. the reward
(The reward is all we care about)
 Computing the gradients analytically is intractable
 We used a zeroorder optimizer: Bayesian optimization
 (and an LQG framework)
Bansal, S.; Calandra, R.; Xiao, T.; Levine, S. & Tomlin, C. J.
GoalDriven Dynamics Learning via Bayesian Optimization
IEEE Conference on Decision and Control (CDC), 2017, 51685173
Realworld Quadcopter
Bansal, S.; Calandra, R.; Xiao, T.; Levine, S. & Tomlin, C. J.
GoalDriven Dynamics Learning via Bayesian Optimization
IEEE Conference on Decision and Control (CDC), 2017, 51685173
Dubins Car
Bansal, S.; Calandra, R.; Xiao, T.; Levine, S. & Tomlin, C. J.
GoalDriven Dynamics Learning via Bayesian Optimization
IEEE Conference on Decision and Control (CDC), 2017, 51685173
Conclusion
There exist models that are wrong, but nearly optimal when used for control
 From a Sys.ID perspective, they are completely wrong
 These models might be outofclass (e.g., linear model for nonlinear dynamics)
 Hyphothesis: these models capture some structure of the optimal solution, ignoring the rest of the space
 Evidence: these models do not seem to generalize to new tasks
Understand and Overcome the Limitations of MBRL
 Can we avoid the multiplicative error of recursive onestep predictions?
Lambert, N.; Wilcox, A.; Zhang, H.; Pister, K. S. J. & Calandra, R.
Learning Accurate Longterm Dynamics for Modelbased Reinforcement Learning
IEEE Conference on Decision and Control (CDC), 2021
(YES)
 Can we dynamically tune MBRL hyperparameters?
Zhang, B.; Rajan, R.; Pineda, L.; Lambert, N.; Biedenkapp, A.; Chua, K.; Hutter, F. & Calandra, R.
On the Importance of Hyperparameter Optimization for Modelbased Reinforcement Learning
International Conference on Artificial Intelligence and Statistics (AISTATS), 2021
(YES)
 Are accurate models condition necessary for good control performance?
 Are accurate models condition sufficient for good control performance?
Bansal, S.; Calandra, R.; Xiao, T.; Levine, S. & Tomlin, C. J.
GoalDriven Dynamics Learning via Bayesian Optimization
IEEE Conference on Decision and Control (CDC), 2017, 51685173
(NO)
(NO)
Lambert, N.; Amos, B.; Yadan, O. & Calandra, R.
Objective Mismatch in Modelbased Reinforcement Learning
Learning for Dynamics and Control (L4DC), 2020, 761770
At Last.
Revisiting Linear Embeddings
Highdimensional BO with Linear Embeddings
Z. Wang, F. Hutter, M. Zoghi, D. Matheson, and N. de Freitas.
Bayesian optimization in a billion dimensions via random embeddings.
Journal of Artificial Intelligence Research, 55:361–387, 2016
Very neat Idea!
But several wrong assumptions...
A Few fixes
Letham, B.; Calandra, R.; Rai, A. & Bakshy, E.
ReExamining Linear Embeddings for Highdimensional Bayesian Optimization
Advances in Neural Information Processing Systems (NeurIPS), 2020
 Linear projections do not preserve product kernels.
 Mahalanobis Kernel
 Most points in the embedding map to the facets of the projection
 Constrain the embedding optimization to points within the bounds
 Linear embeddings can have a low probability of containing an optimum.
 Unit hypersphere sampling for the projection
Results
Letham, B.; Calandra, R.; Rai, A. & Bakshy, E.
ReExamining Linear Embeddings for Highdimensional Bayesian Optimization
Advances in Neural Information Processing Systems (NeurIPS), 2020
Collaborators
and more...
Summary
 Gave a glimpse into some challenges of Robotics

Shown several successful application of BO in Robotics:
 Learning to walk with the bipedal robot "Fox"
 Multiobjective BO for navigation with microrobots
 Hierarchical BO for joint morphology/controller optimization
 BO is a powerful tool in the toolbox of any robot learning researcher
 Learned models provide useful insight!
Thank you for your time
References
 Calandra, R.; Seyfarth, A.; Peters, J. & Deisenroth, M. P.
Bayesian Optimization for Learning Gaits under Uncertainty
Annals of Mathematics and Artificial Intelligence (AMAI), 2015, 76, 523  Yi, Z.; Calandra, R.; Veiga, F. F.; van Hoof, H.; Hermans, T.; Zhang, Y. & Peters, J.
Active Tactile Object Exploration with Gaussian Processes
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2016, 49254930  Calandra, R.; Peters, J. & Deisenroth, M. P.
Pareto Front Modeling for Sensitivity Analysis in MultiObjective Bayesian Optimization
NIPS Workshop on Bayesian Optimization (BayesOpt), 2014  Bansal, S.; Calandra, R.; Xiao, T.; Levine, S. & Tomlin, C. J.
GoalDriven Dynamics Learning via Bayesian Optimization
IEEE Conference on Decision and Control (CDC), 2017, 51685173  Liao, T.; Wang, G.; Yang, B.; Lee, R.; Pister, K.; Levine, S. & Calandra, R.
Dataefficient Learning of Morphology and Controller for a Microrobot
IEEE International Conference on Robotics and Automation (ICRA), 2019, 24882494  Yang, B.; Wang, G.; Calandra, R.; Contreras, D.; Levine, S. & Pister, K.
Learning Flexible and Reusable Locomotion Primitives for a Microrobot
IEEE Robotics and Automation Letters (RAL), 2018, 3, 19041911  Letham, B.; Calandra, R.; Rai, A. & Bakshy, E.
ReExamining Linear Embeddings for Highdimensional Bayesian Optimization
Advances in Neural Information Processing Systems (NeurIPS), 2020
Bayesian Optimization for Robotics
By Roberto Calandra
Bayesian Optimization for Robotics
Designing and tuning controllers for realworld robots is a daunting task which typically requires significant expertise and lengthy experimentation. Bayesian optimization has shown to be a successful approach to automate these tasks with little human expertise required. In this talk, I will discuss the main challenges of robot learning, and how BO helps to overcome some of them. Using as showcase realworld applications where BO proved to be effective, I will also discuss how the challenges encountered in robotics applications can guide the development of new BO algorithms.
 475