Bayesian Optimization for Robotics

Roberto Calandra

Secondmind - 14 Oct 2021

Facebook AI Research

Goals of the talk

Explain some of the challenges in Robotics
Present multiple successful applications of BO in Robotics:
- Learning to walk with a bipedal robot
- Multi-objective BO for navigation with micro-robots
- Hierarchical BO for joint morphology/controller optimization
- High-dimensional BO with linear embeddings (done right)
Argue why BO is a powerful tool for Robotics

State-of-the-art in Robotics

From YouTube: https://www.youtube.com/watch?v=g0TaYhjpOfo

Why Learning?

Robotics still heavily rely on human expertise !

On one hand, it is unfeasible to hand-design general purpose controllers

Human design is time-consuming and rely on prior expertise
Real-world experiments are expensive and stochastic

On the other hand, there is mistrust for automatic design of controllers

Not verifiable
Often find qualitatively different solutions
(Maybe a bit of human presumption)

Black-box Optimization

x^* = \text{arg min} \quad f(x)

Optimized parameters

Objective function

Parameters to optimize

{x\in R^d}

Bayesian Optimization for Policy Search

\theta^* =\text{arg max}_\theta\, R[{\pi(\theta)}]

a_t = \pi(s_t, \theta)

Policy (i.e., parametrized controller)

Action executed

Learning a controller is equivalent to optimizing the parameters of the controller

Current state

Parameters of the policy

0-order
Stochastic
Expensive evaluation

The Beginning

Learning to Walk with a Bipedal Robot

Bio-inspired Bipedal Robot "Fox":

Quasi-passive dynamic walker
4 Degrees of freedom
Springs in legs
Walking in circle
Finite-state-machine controller (from biomechanics)
8 open parameters
(Motors life ~200 trials)

Calandra, R.; Seyfarth, A.; Peters, J. & Deisenroth, M. P.
Bayesian Optimization for Learning Gaits under Uncertainty
Annals of Mathematics and Artificial Intelligence (AMAI), 2015, 76, 5-23

Learning to Walk in 80 Trials

Learning Curve

Calandra, R.; Seyfarth, A.; Peters, J. & Deisenroth, M. P.
Bayesian Optimization for Learning Gaits under Uncertainty
Annals of Mathematics and Artificial Intelligence (AMAI), 2015, 76, 5-23

Comparison

Calandra, R.; Seyfarth, A.; Peters, J. & Deisenroth, M. P.
Bayesian Optimization for Learning Gaits under Uncertainty
Annals of Mathematics and Artificial Intelligence (AMAI), 2015, 76, 5-23

Learned model

Not Symmetrical (about 5° difference). Why?

Because it is walking in a circle!

Calandra, R.; Seyfarth, A.; Peters, J. & Deisenroth, M. P.
Bayesian Optimization for Learning Gaits under Uncertainty
Annals of Mathematics and Artificial Intelligence (AMAI), 2015, 76, 5-23

Beyond Single Objective

Locomotion as Multi-objective Optimization

Trade-off between Walking Speed and Energy Consumption!

Multi-objective Optimization

Most engineering problems are truly multi-objective

{x\in R^d}

x^* = \text{arg min} \quad \{f_1(x),\ldots,f_n(x) \}

Pareto Front

Not all objective functions can be optimized at once
Solving this optimization means finding the
PF identifies should be:
- Complete
- Dense
- Accurate

Predicting Pareto Front

20 Evaluations

50 Evaluations

200 Evaluations

Calandra, R.; Peters, J. & Deisenroth, M. P.
Pareto Front Modeling for Sensitivity Analysis in Multi-Objective Bayesian Optimization
NIPS Workshop on Bayesian Optimization (BayesOpt), 2014

Predicting Pareto Front

MOP2

ZDT3

Predicting Pareto Front (Noisy)

Calandra, R.; Peters, J. & Deisenroth, M. P.
Pareto Front Modeling for Sensitivity Analysis in Multi-Objective Bayesian Optimization
NIPS Workshop on Bayesian Optimization (BayesOpt), 2014

MOP2

ZDT3

Predicting Pareto Front (Noisy)

Sensitivity Analysis

Sensitivity Analysis (MOP2)

Calandra, R.; Peters, J. & Deisenroth, M. P.
Pareto Front Modeling for Sensitivity Analysis in Multi-Objective Bayesian Optimization
NIPS Workshop on Bayesian Optimization (BayesOpt), 2014

Sensitivity Analysis (RMTP3)

Learning to Walk with Micro-robots

Micro-robots

Simulated hexapod:

12 Degrees of Freedom (2 per legs)
No good physics models at that scale
Central Pattern Generators (CPG) as controller

Let's apply all the tools we have so far!

Yang, B.; Wang, G.; Calandra, R.; Contreras, D.; Levine, S. & Pister, K.
Learning Flexible and Reusable Locomotion Primitives for a Microrobot
IEEE Robotics and Automation Letters (RA-L), 2018, 3, 1904-1911

Hard-coded CPG Gaits

Single-objective

Dual Tripod Gait

Multi-objective

Comparison Gaits

Discovering New Gaits

Contextual Bayesian Optimization

x^* = \text{arg min} \quad f(x)

Optimized parameters

Objective function

Parameters to optimize

{x\in R^d}

x^* = \text{arg min} \quad f(x, c )

Context

Contextual BO

Learning Locomotion Primitives

With 50 trials for each of the 5 goal targets, we can learn fairly accurate model
The trick was to consider it a contextual BO at training time, and then convert to MOO

Combining Primitives for Navigation

(More) Expensive Optimization

Joint Morphology/Controller Optimization

In Robotics, there is a tight relationship between morphologies and controllers
Design of morphologies is a complex and time-consuming process
Can we automate it?
Same simulated hexapod as before:
- Each manufacturing round takes about 1 month in real-world...
- ...But we can fabricate multiple different morphology configurations at once (up to 5)

Liao, T.; Wang, G.; Yang, B.; Lee, R.; Pister, K.; Levine, S. & Calandra, R.
Data-efficient Learning of Morphology and Controller for a Microrobot
IEEE International Conference on Robotics and Automation (ICRA), 2019

Hierarchical Process Constrained Batch Bayesian Optimization (HPC-BBO)

Two levels of optimization
(instead of a single bigger optimization)

Allows to weight the different cost of the two types of parameters
Each of the two levels uses information from the other level:
- The morphology level consider the best policy achieved for each morphology design
- The controller level uses the morphology as context
Batch evaluation to reduce fabrication time

Results

Top 4 Morphologies

Exchanging the morphology severely degrade the controller performance.
This evidence supports the hypothesis that morphology and controller need to be tightly coupled

Brief Interlude

(BO for Model-Based Reinforcement Learning)

Model-based Reinforcement Learning

Is Something Strange about MBRL?

How to Use the Reward?

Goal-Driven Dynamics Learning

Instead of optimizing the forward dynamics w.r.t. the NLL of the next state, we optimize w.r.t. the reward
(The reward is all we care about)
Computing the gradients analytically is intractable
We used a zero-order optimizer: Bayesian optimization
(and an LQG framework)

Bansal, S.; Calandra, R.; Xiao, T.; Levine, S. & Tomlin, C. J.
Goal-Driven Dynamics Learning via Bayesian Optimization
IEEE Conference on Decision and Control (CDC), 2017, 5168-5173

Real-world Quadcopter

Bansal, S.; Calandra, R.; Xiao, T.; Levine, S. & Tomlin, C. J.
Goal-Driven Dynamics Learning via Bayesian Optimization
IEEE Conference on Decision and Control (CDC), 2017, 5168-5173

Dubins Car

Bansal, S.; Calandra, R.; Xiao, T.; Levine, S. & Tomlin, C. J.
Goal-Driven Dynamics Learning via Bayesian Optimization
IEEE Conference on Decision and Control (CDC), 2017, 5168-5173

Conclusion

There exist models that are wrong, but nearly optimal when used for control

From a Sys.ID perspective, they are completely wrong
These models might be out-of-class (e.g., linear model for non-linear dynamics)
Hyphothesis: these models capture some structure of the optimal solution, ignoring the rest of the space
Evidence: these models do not seem to generalize to new tasks

Understand and Overcome the Limitations of MBRL

Can we avoid the multiplicative error of recursive one-step predictions?

Lambert, N.; Wilcox, A.; Zhang, H.; Pister, K. S. J. & Calandra, R.
Learning Accurate Long-term Dynamics for Model-based Reinforcement Learning
IEEE Conference on Decision and Control (CDC), 2021

(YES)

Can we dynamically tune MBRL hyperparameters?

Zhang, B.; Rajan, R.; Pineda, L.; Lambert, N.; Biedenkapp, A.; Chua, K.; Hutter, F. & Calandra, R.
On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning
International Conference on Artificial Intelligence and Statistics (AISTATS), 2021

(YES)

Are accurate models condition necessary for good control performance?

Are accurate models condition sufficient for good control performance?

Bansal, S.; Calandra, R.; Xiao, T.; Levine, S. & Tomlin, C. J.
Goal-Driven Dynamics Learning via Bayesian Optimization
IEEE Conference on Decision and Control (CDC), 2017, 5168-5173

(NO)

Lambert, N.; Amos, B.; Yadan, O. & Calandra, R.
Objective Mismatch in Model-based Reinforcement Learning
Learning for Dynamics and Control (L4DC), 2020, 761-770

At Last.
Revisiting Linear Embeddings

High-dimensional BO with Linear Embeddings

Z. Wang, F. Hutter, M. Zoghi, D. Matheson, and N. de Freitas.
Bayesian optimization in a billion dimensions via random embeddings.
Journal of Artificial Intelligence Research, 55:361–387, 2016

Very neat Idea!

But several wrong assumptions...

A Few fixes

Letham, B.; Calandra, R.; Rai, A. & Bakshy, E.

Re-Examining Linear Embeddings for High-dimensional Bayesian Optimization

Advances in Neural Information Processing Systems (NeurIPS), 2020

Linear projections do not preserve product kernels.
- Mahalanobis Kernel
Most points in the embedding map to the facets of the projection
- Constrain the embedding optimization to points within the bounds
Linear embeddings can have a low probability of containing an optimum.
- Unit hypersphere sampling for the projection

Results

Letham, B.; Calandra, R.; Rai, A. & Bakshy, E.

Re-Examining Linear Embeddings for High-dimensional Bayesian Optimization

Advances in Neural Information Processing Systems (NeurIPS), 2020

Collaborators

and more...

Summary

Gave a glimpse into some challenges of Robotics
Shown several successful application of BO in Robotics:
- Learning to walk with the bipedal robot "Fox"
- Multi-objective BO for navigation with micro-robots
- Hierarchical BO for joint morphology/controller optimization
BO is a powerful tool in the toolbox of any robot learning researcher
- Learned models provide useful insight!

Thank you for your time

References

Calandra, R.; Seyfarth, A.; Peters, J. & Deisenroth, M. P.
Bayesian Optimization for Learning Gaits under Uncertainty
Annals of Mathematics and Artificial Intelligence (AMAI), 2015, 76, 5-23
Yi, Z.; Calandra, R.; Veiga, F. F.; van Hoof, H.; Hermans, T.; Zhang, Y. & Peters, J.
Active Tactile Object Exploration with Gaussian Processes
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2016, 4925-4930
Calandra, R.; Peters, J. & Deisenroth, M. P.
Pareto Front Modeling for Sensitivity Analysis in Multi-Objective Bayesian Optimization
NIPS Workshop on Bayesian Optimization (BayesOpt), 2014
Bansal, S.; Calandra, R.; Xiao, T.; Levine, S. & Tomlin, C. J.
Goal-Driven Dynamics Learning via Bayesian Optimization
IEEE Conference on Decision and Control (CDC), 2017, 5168-5173
Liao, T.; Wang, G.; Yang, B.; Lee, R.; Pister, K.; Levine, S. & Calandra, R.
Data-efficient Learning of Morphology and Controller for a Microrobot
IEEE International Conference on Robotics and Automation (ICRA), 2019, 2488-2494
Yang, B.; Wang, G.; Calandra, R.; Contreras, D.; Levine, S. & Pister, K.
Learning Flexible and Reusable Locomotion Primitives for a Microrobot
IEEE Robotics and Automation Letters (RA-L), 2018, 3, 1904-1911
Letham, B.; Calandra, R.; Rai, A. & Bakshy, E.
Re-Examining Linear Embeddings for High-dimensional Bayesian Optimization
Advances in Neural Information Processing Systems (NeurIPS), 2020

Bayesian Optimization for Robotics

By Roberto Calandra

Bayesian Optimization for Robotics

Designing and tuning controllers for real-world robots is a daunting task which typically requires significant expertise and lengthy experimentation. Bayesian optimization has shown to be a successful approach to automate these tasks with little human expertise required. In this talk, I will discuss the main challenges of robot learning, and how BO helps to overcome some of them. Using as showcase real-world applications where BO proved to be effective, I will also discuss how the challenges encountered in robotics applications can guide the development of new BO algorithms.

1,277

Roberto Calandra PRO

Full Professor at TU Dresden. Head of the LASR Lab. Working in AI, Robotics and Touch Sensing.

Bayesian Optimization for Robotics

Goals of the talk

State-of-the-art in Robotics

Why Learning?

Black-box Optimization

Bayesian Optimization for Policy Search

The Beginning

Learning to Walk with a Bipedal Robot

Learning to Walk in 80 Trials

Learning Curve

Comparison

Learned model

Beyond Single Objective

Locomotion as Multi-objective Optimization

Multi-objective Optimization

Predicting Pareto Front

Predicting Pareto Front

Predicting Pareto Front (Noisy)

Predicting Pareto Front (Noisy)

Sensitivity Analysis

Sensitivity Analysis (MOP2)

Sensitivity Analysis (RMTP3)

Learning to Walk with Micro-robots

Micro-robots

Hard-coded CPG Gaits

Single-objective

Dual Tripod Gait

Multi-objective

Comparison Gaits

Discovering New Gaits

Contextual Bayesian Optimization

Contextual BO

Learning Locomotion Primitives

Combining Primitives for Navigation

(More) Expensive Optimization

Joint Morphology/Controller Optimization

Hierarchical Process Constrained Batch Bayesian Optimization (HPC-BBO)

Results

Top 4 Morphologies

Brief Interlude (BO for Model-Based Reinforcement Learning)

Model-based Reinforcement Learning

Is Something Strange about MBRL?

How to Use the Reward?

Goal-Driven Dynamics Learning

Real-world Quadcopter

Dubins Car

Conclusion

Understand and Overcome the Limitations of MBRL

At Last. Revisiting Linear Embeddings

High-dimensional BO with Linear Embeddings

A Few fixes

Results

Collaborators

Summary

References

Bayesian Optimization for Robotics

More from Roberto Calandra

Brief Interlude

(BO for Model-Based Reinforcement Learning)

At Last.
Revisiting Linear Embeddings