Bayesian Optimization for Robotics

Roberto Calandra

Secondmind - 14 Oct 2021

Facebook AI Research

Goals of the talk

  • Explain some of the challenges in Robotics
  • Present multiple successful applications of BO in Robotics:
    • Learning to walk with a bipedal robot
    • Multi-objective BO for navigation with micro-robots
    • Hierarchical BO for joint morphology/controller optimization
    • High-dimensional BO with linear embeddings (done right)
  • Argue why BO is a powerful tool for Robotics

State-of-the-art in Robotics

Why Learning?

Robotics still heavily rely on human expertise !

On one hand, it is unfeasible to hand-design general purpose controllers

  • Human design is time-consuming and rely on prior expertise
  • Real-world experiments are expensive and stochastic

 

On the other hand, there is mistrust for automatic design of controllers

  • Not verifiable
  • Often find qualitatively different solutions
  • (Maybe a bit of human presumption)

Black-box Optimization

x^* = \text{arg min} \quad f(x)

Optimized parameters

Objective function

Parameters to optimize

{x\in R^d}

Bayesian Optimization for Policy Search

\theta^* =\text{arg max}_\theta\, R[{\pi(\theta)}]
a_t = \pi(s_t, \theta)

Policy (i.e., parametrized controller)

Action executed

Learning a controller is equivalent to optimizing the parameters of the controller

Current state

Parameters of the policy

  • 0-order
  • Stochastic
  • Expensive evaluation

The Beginning

Learning to Walk with a Bipedal Robot

Bio-inspired Bipedal Robot "Fox":

  • Quasi-passive dynamic walker
  • 4 Degrees of freedom
  • Springs in legs
  • Walking in circle
  • Finite-state-machine controller (from biomechanics)
  • 8 open parameters
  • (Motors life ~200 trials)

Calandra, R.; Seyfarth, A.; Peters, J. & Deisenroth, M. P.
Bayesian Optimization for Learning Gaits under Uncertainty
Annals of Mathematics and Artificial Intelligence (AMAI), 2015, 76, 5-23

Learning to Walk in 80 Trials

Learning Curve

Calandra, R.; Seyfarth, A.; Peters, J. & Deisenroth, M. P.
Bayesian Optimization for Learning Gaits under Uncertainty
Annals of Mathematics and Artificial Intelligence (AMAI), 2015, 76, 5-23

Comparison

Calandra, R.; Seyfarth, A.; Peters, J. & Deisenroth, M. P.
Bayesian Optimization for Learning Gaits under Uncertainty
Annals of Mathematics and Artificial Intelligence (AMAI), 2015, 76, 5-23

Learned model

Not Symmetrical (about 5° difference). Why?

Because it is walking in a circle!

Calandra, R.; Seyfarth, A.; Peters, J. & Deisenroth, M. P.
Bayesian Optimization for Learning Gaits under Uncertainty
Annals of Mathematics and Artificial Intelligence (AMAI), 2015, 76, 5-23

Beyond Single Objective

Locomotion as Multi-objective Optimization

Trade-off between Walking Speed and Energy Consumption!

Multi-objective Optimization

  • Most engineering problems are truly multi-objective
{x\in R^d}
x^* = \text{arg min} \quad \{f_1(x),\ldots,f_n(x) \}

Pareto Front

  • Not all objective functions can be optimized at once
  • Solving this optimization means finding the
  • PF identifies should be:
    • Complete
    • Dense
    • Accurate

Predicting Pareto Front

20 Evaluations

50 Evaluations

200 Evaluations

Calandra, R.; Peters, J. & Deisenroth, M. P.
Pareto Front Modeling for Sensitivity Analysis in Multi-Objective Bayesian Optimization
NIPS Workshop on Bayesian Optimization (BayesOpt), 2014

Predicting Pareto Front

MOP2

ZDT3

Predicting Pareto Front (Noisy)

Calandra, R.; Peters, J. & Deisenroth, M. P.
Pareto Front Modeling for Sensitivity Analysis in Multi-Objective Bayesian Optimization
NIPS Workshop on Bayesian Optimization (BayesOpt), 2014

MOP2

ZDT3

Predicting Pareto Front (Noisy)

Sensitivity Analysis

Sensitivity Analysis (MOP2)

Calandra, R.; Peters, J. & Deisenroth, M. P.
Pareto Front Modeling for Sensitivity Analysis in Multi-Objective Bayesian Optimization
NIPS Workshop on Bayesian Optimization (BayesOpt), 2014

Sensitivity Analysis (RMTP3)

Learning to Walk with Micro-robots

Micro-robots

Simulated hexapod:

  • 12 Degrees of Freedom (2 per legs)
  • No good physics models at that scale
  • Central Pattern Generators (CPG) as controller

Let's apply all the tools we have so far!

Yang, B.; Wang, G.; Calandra, R.; Contreras, D.; Levine, S. & Pister, K.
Learning Flexible and Reusable Locomotion Primitives for a Microrobot 
IEEE Robotics and Automation Letters (RA-L), 2018, 3, 1904-1911

Hard-coded CPG Gaits

Yang, B.; Wang, G.; Calandra, R.; Contreras, D.; Levine, S. & Pister, K.
Learning Flexible and Reusable Locomotion Primitives for a Microrobot 
IEEE Robotics and Automation Letters (RA-L), 2018, 3, 1904-1911

Single-objective

Yang, B.; Wang, G.; Calandra, R.; Contreras, D.; Levine, S. & Pister, K.
Learning Flexible and Reusable Locomotion Primitives for a Microrobot 
IEEE Robotics and Automation Letters (RA-L), 2018, 3, 1904-1911

Dual Tripod Gait

Multi-objective

Yang, B.; Wang, G.; Calandra, R.; Contreras, D.; Levine, S. & Pister, K.
Learning Flexible and Reusable Locomotion Primitives for a Microrobot 
IEEE Robotics and Automation Letters (RA-L), 2018, 3, 1904-1911

Comparison Gaits

Yang, B.; Wang, G.; Calandra, R.; Contreras, D.; Levine, S. & Pister, K.
Learning Flexible and Reusable Locomotion Primitives for a Microrobot 
IEEE Robotics and Automation Letters (RA-L), 2018, 3, 1904-1911

Discovering New Gaits

Contextual Bayesian Optimization

x^* = \text{arg min} \quad f(x)

Optimized parameters

Objective function

Parameters to optimize

{x\in R^d}
x^* = \text{arg min} \quad f(x, c )

Context

Contextual BO

Yang, B.; Wang, G.; Calandra, R.; Contreras, D.; Levine, S. & Pister, K.
Learning Flexible and Reusable Locomotion Primitives for a Microrobot 
IEEE Robotics and Automation Letters (RA-L), 2018, 3, 1904-1911

Learning Locomotion Primitives

  • With 50 trials for each of the 5 goal targets, we can learn fairly accurate model
  • The trick was to consider it a contextual BO at training time, and then convert to MOO

Combining Primitives for Navigation

(More) Expensive Optimization

Joint Morphology/Controller Optimization

  • In Robotics, there is a tight relationship between morphologies and controllers
  • Design of morphologies is a complex and time-consuming process
  • Can we automate it?
  • Same simulated hexapod as before:
    • Each manufacturing round takes about 1 month in real-world...
    • ...But we can fabricate multiple different morphology configurations at once (up to 5)

Liao, T.; Wang, G.; Yang, B.; Lee, R.; Pister, K.; Levine, S. & Calandra, R.
Data-efficient Learning of Morphology and Controller for a Microrobot
IEEE International Conference on Robotics and Automation (ICRA), 2019

Hierarchical Process Constrained Batch Bayesian Optimization (HPC-BBO)

Two levels of optimization
(instead of a single bigger optimization)

  • Allows to weight the different cost of the two types of parameters
  • Each of the two levels uses information from the other level:
    • The morphology level consider the best policy achieved for each morphology design
    • The controller level uses the morphology as context
  • Batch evaluation to reduce fabrication time

Liao, T.; Wang, G.; Yang, B.; Lee, R.; Pister, K.; Levine, S. & Calandra, R.
Data-efficient Learning of Morphology and Controller for a Microrobot
IEEE International Conference on Robotics and Automation (ICRA), 2019

Results

Liao, T.; Wang, G.; Yang, B.; Lee, R.; Pister, K.; Levine, S. & Calandra, R.
Data-efficient Learning of Morphology and Controller for a Microrobot
IEEE International Conference on Robotics and Automation (ICRA), 2019

Top 4 Morphologies

  • Exchanging the morphology severely degrade the controller performance.
  • This evidence supports the hypothesis that morphology and controller need to be tightly coupled


Brief Interlude

(BO for Model-Based Reinforcement Learning)

Model-based Reinforcement Learning

Is Something Strange about MBRL?

How to Use the Reward?

Goal-Driven Dynamics Learning

  • Instead of optimizing the forward dynamics w.r.t. the NLL of the next state, we optimize w.r.t. the reward
    (The reward is all we care about)
     
  • Computing the gradients analytically is intractable
     
  • We used a zero-order optimizer: Bayesian optimization
     
  • (and an LQG framework)

Bansal, S.; Calandra, R.; Xiao, T.; Levine, S. & Tomlin, C. J.
Goal-Driven Dynamics Learning via Bayesian Optimization
IEEE Conference on Decision and Control (CDC), 2017, 5168-5173

Real-world Quadcopter

Bansal, S.; Calandra, R.; Xiao, T.; Levine, S. & Tomlin, C. J.
Goal-Driven Dynamics Learning via Bayesian Optimization
IEEE Conference on Decision and Control (CDC), 2017, 5168-5173

Dubins Car

Bansal, S.; Calandra, R.; Xiao, T.; Levine, S. & Tomlin, C. J.
Goal-Driven Dynamics Learning via Bayesian Optimization
IEEE Conference on Decision and Control (CDC), 2017, 5168-5173

Conclusion

There exist models that are wrong, but nearly optimal when used for control

  • From a Sys.ID perspective, they are completely wrong
  • These models might be out-of-class (e.g., linear model for non-linear dynamics)
  • Hyphothesis: these models capture some structure of the optimal solution, ignoring the rest of the space
  • Evidence: these models do not seem to generalize to new tasks

Understand and Overcome the Limitations of MBRL

  • Can we avoid the multiplicative error of recursive one-step predictions?

Lambert, N.; Wilcox, A.; Zhang, H.; Pister, K. S. J. & Calandra, R.
Learning Accurate Long-term Dynamics for Model-based Reinforcement Learning

IEEE Conference on Decision and Control (CDC), 2021

(YES)

  • Can we dynamically tune MBRL hyperparameters?

Zhang, B.; Rajan, R.; Pineda, L.; Lambert, N.; Biedenkapp, A.; Chua, K.; Hutter, F. & Calandra, R.
On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning
International Conference on Artificial Intelligence and Statistics (AISTATS), 2021

(YES)

  • Are accurate models condition necessary for good control performance?
  • Are accurate models condition sufficient for good control performance?

Bansal, S.; Calandra, R.; Xiao, T.; Levine, S. & Tomlin, C. J.
Goal-Driven Dynamics Learning via Bayesian Optimization
IEEE Conference on Decision and Control (CDC), 2017, 5168-5173

(NO)

(NO)

Lambert, N.; Amos, B.; Yadan, O. & Calandra, R.
Objective Mismatch in Model-based Reinforcement Learning
Learning for Dynamics and Control (L4DC), 2020, 761-770

At Last.
Revisiting Linear Embeddings

High-dimensional BO with Linear Embeddings

Z. Wang, F. Hutter, M. Zoghi, D. Matheson, and N. de Freitas.
Bayesian optimization in a billion dimensions via random embeddings.
Journal of Artificial Intelligence Research, 55:361–387, 2016

Very neat Idea!

But several wrong assumptions...

A Few fixes

 Letham, B.; Calandra, R.; Rai, A. & Bakshy, E.

 Re-Examining Linear Embeddings for High-dimensional Bayesian Optimization

 Advances in Neural Information Processing Systems (NeurIPS), 2020

  • Linear projections do not preserve product kernels.
    • Mahalanobis Kernel
  • Most points in the embedding map to the facets of the projection
    • Constrain the embedding optimization to points within the bounds
  • Linear embeddings can have a low probability of containing an optimum.
    • Unit hypersphere sampling for the projection

Results

 Letham, B.; Calandra, R.; Rai, A. & Bakshy, E.

 Re-Examining Linear Embeddings for High-dimensional Bayesian Optimization

 Advances in Neural Information Processing Systems (NeurIPS), 2020

Collaborators

and more...

Summary

  • Gave a glimpse into some challenges of Robotics
  • Shown several successful application of BO in Robotics:
    • Learning to walk with the bipedal robot "Fox"
    • Multi-objective BO for navigation with micro-robots
    • Hierarchical BO for joint morphology/controller optimization
  • BO is a powerful tool in the toolbox of any robot learning researcher
    • Learned models provide useful insight!

Thank you for your time

References

  • Calandra, R.; Seyfarth, A.; Peters, J. & Deisenroth, M. P.
    Bayesian Optimization for Learning Gaits under Uncertainty
    Annals of Mathematics and Artificial Intelligence (AMAI), 2015, 76, 5-23 
  • Yi, Z.; Calandra, R.; Veiga, F. F.; van Hoof, H.; Hermans, T.; Zhang, Y. & Peters, J.
    Active Tactile Object Exploration with Gaussian Processes
    IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2016, 4925-4930
  • Calandra, R.; Peters, J. & Deisenroth, M. P.
    Pareto Front Modeling for Sensitivity Analysis in Multi-Objective Bayesian Optimization
    NIPS Workshop on Bayesian Optimization (BayesOpt), 2014
  • Bansal, S.; Calandra, R.; Xiao, T.; Levine, S. & Tomlin, C. J.
    Goal-Driven Dynamics Learning via Bayesian Optimization
    IEEE Conference on Decision and Control (CDC), 2017, 5168-5173
  • Liao, T.; Wang, G.; Yang, B.; Lee, R.; Pister, K.; Levine, S. & Calandra, R.
    Data-efficient Learning of Morphology and Controller for a Microrobot
    IEEE International Conference on Robotics and Automation (ICRA), 2019, 2488-2494
  • Yang, B.; Wang, G.; Calandra, R.; Contreras, D.; Levine, S. & Pister, K.
    Learning Flexible and Reusable Locomotion Primitives for a Microrobot
    IEEE Robotics and Automation Letters (RA-L), 2018, 3, 1904-1911
  • Letham, B.; Calandra, R.; Rai, A. & Bakshy, E.
    Re-Examining Linear Embeddings for High-dimensional Bayesian Optimization
    Advances in Neural Information Processing Systems (NeurIPS), 2020

Bayesian Optimization for Robotics

By Roberto Calandra

Bayesian Optimization for Robotics

Designing and tuning controllers for real-world robots is a daunting task which typically requires significant expertise and lengthy experimentation. Bayesian optimization has shown to be a successful approach to automate these tasks with little human expertise required. In this talk, I will discuss the main challenges of robot learning, and how BO helps to overcome some of them. Using as showcase real-world applications where BO proved to be effective, I will also discuss how the challenges encountered in robotics applications can guide the development of new BO algorithms.

  • 1,397