Bayesian Optimization for Robotics

Roberto Calandra

Facebook AI Research

JSM - 01 July 2019

Goals of the talk

  • Explain some of the challenges in Robotics
  • Present multiple successful applications of BO in Robotics:
    • Learning to walk with a bipedal robot
    • Multi-objective BO for navigation with micro-robots
    • Hierarchical BO for joint morphology/controller optimization
  • Argue why BO is a powerful tool for Robotics

State-of-the-art in Robotics

Why Learning?

Robotics still heavily rely on human expertise !

On one hand, it is unfeasible to hand-design general purpose controllers

  • Human design is time-consuming and rely on prior expertise
  • Real-world experiments are expensive and stochastic

 

On the other hand, there is mistrust for automatic design of controllers

  • Not verifiable
  • Often find qualitatively different solutions
  • (Maybe a bit of human presumption)

Bayesian Optimization for Policy Search

\theta^* =\text{arg max}_\theta\, R[{\pi(\theta)}]
a_t = \pi(s_t, \theta)

Policy (i.e., parametrized controller)

Action executed

Learning a controller is equivalent to optimizing the parameters of the controller

Current state

Parameters of the policy

  • 0-order
  • Stochastic
  • Expensive evaluation

Learning to Walk with a Bipedal Robot

Bio-inspired Bipedal Robot "Fox":

  • Quasi-passive dynamic walker
  • 4 Degrees of freedom
  • Springs in legs
  • Walking in circle
  • Finite-state-machine controller (from biomechanics)
  • 8 open parameters
  • (Motors life ~200 trials)

[Calandra, R.; Seyfarth, A.; Peters, J. & Deisenroth, M. P. Bayesian Optimization for Learning Gaits under Uncertainty Annals of Mathematics and Artificial Intelligence (AMAI), 2015, 76, 5-23]

Learning to Walk in 80 Trials

Learned model

Not Symmetrical (about 5° difference). Why?

Because it is walking in a circle!

Locomotion as Multi-objective Optimization

Micro-robots

Simulated hexapod:

  • 12 Degrees of Freedom (2 per legs)
  • No good physics models at that scale
  • we use Central Pattern Generators (CPG) as controllers

Question: can we move beyond standard single-objective BO?

[Yang, B.; Wang, G.; Calandra, R.; Contreras, D.; Levine, S. & Pister, K. Learning Flexible and Reusable Locomotion Primitives for a Microrobot  IEEE Robotics and Automation Letters (RA-L), 2018, 3, 1904-1911]

Hard-coded CPG Gaits

Single-objective

Dual Tripod Gait

Multi-objective

Comparison Gaits

Discovering New Gaits

Contextual BO

Learning Locomotion Primitives

  • With 50 trials for each of the 5 goal targets, we can learn fairly accurate model
  • The trick was to consider it a contextual BO at training time, and then convert to MOO

Combining Primitives for Navigation

Joint Morphology/Controller Optimization

  • In Robotics, there is a tight relationship between morphologies and controllers
  • Design of morphologies is a complex and time-consuming process
  • Can we automate it?
  • Same simulated hexapod as before:
    • Each manufacturing round takes about 1 month in real-world...
    • ...But we can fabricate multiple different morphology configurations at once (up to 5)

[Liao, T.; Wang, G.; Yang, B.; Lee, R.; Pister, K.; Levine, S. & Calandra, R. Data-efficient Learning of Morphology and Controller for a Microrobot IEEE International Conference on Robotics and Automation (ICRA), 2019]

Hierarchical Process Constrained Batch Bayesian Optimization (HPC-BBO)

 

Two levels of optimization
(instead of a single bigger optimization)

  • Allows to weight the different cost of the two types of parameters
  • Each of the two levels uses information from the other level:
    • The morphology level consider the best policy achieved for each morphology design
    • The controller level uses the morphology as context
  • Batch evaluation to reduce fabrication time

Results

Top 4 Morphologies

  • Exchanging the morphology severely degrade the controller performance.
  • This evidence supports the hypothesis that morphology and controller are tightly coupled

Collaborators

Summary

  • Gave a glimpse into some challenges of Robotics
  • Shown 3 successful application of BO in Robotics:
    • Learning to walk with the bipedal robot "Fox"
    • Multi-objective BO for navigation with micro-robots
    • Hierarchical BO for joint morphology/controller optimization
  • BO is a powerful tool for automatic controller tuning
    • Learned models provide useful insight!

Future challenges:

  • Safe optimization
  • Exploiting more structure from classic control
  • Higher-dimensional parameters space

 

Thank you for your time

References

  • Calandra, R.; Seyfarth, A.; Peters, J. & Deisenroth, M. P.
    Bayesian Optimization for Learning Gaits under Uncertainty
    Annals of Mathematics and Artificial Intelligence (AMAI), 2015, 76, 5-23
  • Bansal, S.; Calandra, R.; Xiao, T.; Levine, S. & Tomlin, C. J.
    Goal-Driven Dynamics Learning via Bayesian Optimization
    IEEE Conference on Decision and Control (CDC), 2017, 5168-5173
  • Yang, B.; Wang, G.; Calandra, R.; Contreras, D.; Levine, S. & Pister, K.
    Learning Flexible and Reusable Locomotion Primitives for a Microrobot
    IEEE Robotics and Automation Letters (RA-L), 2018, 3, 1904-1911
  • Liao, T.; Wang, G.; Yang, B.; Lee, R.; Pister, K.; Levine, S. & Calandra, R.
    Data-efficient Learning of Morphology and Controller for a Microrobot
    IEEE International Conference on Robotics and Automation (ICRA), 2019

Learning Curve - Fox

Gait Learning