Bayesian Optimization for Robotics

Roberto Calandra

Facebook AI Research

JSM - 01 July 2019

Goals of the talk

Explain some of the challenges in Robotics
Present multiple successful applications of BO in Robotics:
- Learning to walk with a bipedal robot
- Multi-objective BO for navigation with micro-robots
- Hierarchical BO for joint morphology/controller optimization
Argue why BO is a powerful tool for Robotics

State-of-the-art in Robotics

From YouTube: https://www.youtube.com/watch?v=g0TaYhjpOfo

Why Learning?

Robotics still heavily rely on human expertise !

On one hand, it is unfeasible to hand-design general purpose controllers

Human design is time-consuming and rely on prior expertise
Real-world experiments are expensive and stochastic

On the other hand, there is mistrust for automatic design of controllers

Not verifiable
Often find qualitatively different solutions
(Maybe a bit of human presumption)

Bayesian Optimization for Policy Search

\theta^* =\text{arg max}_\theta\, R[{\pi(\theta)}]

a_t = \pi(s_t, \theta)

Policy (i.e., parametrized controller)

Action executed

Learning a controller is equivalent to optimizing the parameters of the controller

Current state

Parameters of the policy

0-order
Stochastic
Expensive evaluation

Learning to Walk with a Bipedal Robot

Bio-inspired Bipedal Robot "Fox":

Quasi-passive dynamic walker
4 Degrees of freedom
Springs in legs
Walking in circle
Finite-state-machine controller (from biomechanics)
8 open parameters
(Motors life ~200 trials)

[Calandra, R.; Seyfarth, A.; Peters, J. & Deisenroth, M. P. Bayesian Optimization for Learning Gaits under Uncertainty Annals of Mathematics and Artificial Intelligence (AMAI), 2015, 76, 5-23]

Learning to Walk in 80 Trials

Learned model

Not Symmetrical (about 5° difference). Why?

Because it is walking in a circle!

Locomotion as Multi-objective Optimization

Micro-robots

Simulated hexapod:

12 Degrees of Freedom (2 per legs)
No good physics models at that scale
we use Central Pattern Generators (CPG) as controllers

Question: can we move beyond standard single-objective BO?

[Yang, B.; Wang, G.; Calandra, R.; Contreras, D.; Levine, S. & Pister, K. Learning Flexible and Reusable Locomotion Primitives for a Microrobot IEEE Robotics and Automation Letters (RA-L), 2018, 3, 1904-1911]

Hard-coded CPG Gaits

Single-objective

Dual Tripod Gait

Multi-objective

Comparison Gaits

Discovering New Gaits

Contextual BO

Learning Locomotion Primitives

With 50 trials for each of the 5 goal targets, we can learn fairly accurate model
The trick was to consider it a contextual BO at training time, and then convert to MOO

Combining Primitives for Navigation

Joint Morphology/Controller Optimization

In Robotics, there is a tight relationship between morphologies and controllers
Design of morphologies is a complex and time-consuming process
Can we automate it?
Same simulated hexapod as before:
- Each manufacturing round takes about 1 month in real-world...
- ...But we can fabricate multiple different morphology configurations at once (up to 5)

[Liao, T.; Wang, G.; Yang, B.; Lee, R.; Pister, K.; Levine, S. & Calandra, R. Data-efficient Learning of Morphology and Controller for a Microrobot IEEE International Conference on Robotics and Automation (ICRA), 2019]

Hierarchical Process Constrained Batch Bayesian Optimization (HPC-BBO)

Two levels of optimization
(instead of a single bigger optimization)

Allows to weight the different cost of the two types of parameters
Each of the two levels uses information from the other level:
- The morphology level consider the best policy achieved for each morphology design
- The controller level uses the morphology as context
Batch evaluation to reduce fabrication time

Results

Top 4 Morphologies

Exchanging the morphology severely degrade the controller performance.
This evidence supports the hypothesis that morphology and controller are tightly coupled

Collaborators

Summary

Gave a glimpse into some challenges of Robotics
Shown 3 successful application of BO in Robotics:
- Learning to walk with the bipedal robot "Fox"
- Multi-objective BO for navigation with micro-robots
- Hierarchical BO for joint morphology/controller optimization
BO is a powerful tool for automatic controller tuning
- Learned models provide useful insight!

Future challenges:

Safe optimization
Exploiting more structure from classic control
Higher-dimensional parameters space

Thank you for your time

References

Calandra, R.; Seyfarth, A.; Peters, J. & Deisenroth, M. P.
Bayesian Optimization for Learning Gaits under Uncertainty
Annals of Mathematics and Artificial Intelligence (AMAI), 2015, 76, 5-23
Bansal, S.; Calandra, R.; Xiao, T.; Levine, S. & Tomlin, C. J.
Goal-Driven Dynamics Learning via Bayesian Optimization
IEEE Conference on Decision and Control (CDC), 2017, 5168-5173
Yang, B.; Wang, G.; Calandra, R.; Contreras, D.; Levine, S. & Pister, K.
Learning Flexible and Reusable Locomotion Primitives for a Microrobot
IEEE Robotics and Automation Letters (RA-L), 2018, 3, 1904-1911
Liao, T.; Wang, G.; Yang, B.; Lee, R.; Pister, K.; Levine, S. & Calandra, R.
Data-efficient Learning of Morphology and Controller for a Microrobot
IEEE International Conference on Robotics and Automation (ICRA), 2019