AI for Decision Making
Under Uncertainty

Roberto Calandra

Perspectives in Data-Driven Materials Design Summer School - 28 August 2025

Learning, Adaptive Systems, and Robotics (LASR) Lab

  • Decision-Making
  • Optimization
  • Reinforcement Learning
  • Modeling dynamical systems
  • Control and planning
  • Dexterous manipulation
  • Locomotion
  • Hardware design
  • Touch Processing
  • Applications

Learning, Adaptive Systems, and Robotics (LASR) Lab

Machine Learning

Robotics

Touch
Sensing

First Experience with ML for Materials

  • Helsinki 2011
  • Summer job in Aki Vehtari's Lab
  • Gaussian Processes to predict continuous cooling transformation (CCT) diagrams
  • Unfortunately, not working very well
    (too little data)

Overview

  • Bayesian Optimization & Applications
  • Reinforcement Learning
  • Large-scale Autonomous Data Collection

Overview

  • Bayesian Optimization & Applications
  • Reinforcement Learning
  • Large-scale Autonomous Data Collection

Goals of the talk

  • Explain some of the challenges in Robotics
  • Present multiple successful applications of BO in Robotics:
    • Learning to walk with a bipedal robot
    • Multi-objective BO for navigation with micro-robots
    • Hierarchical BO for joint morphology/controller optimization
    • High-dimensional BO with linear embeddings (done right)
  • Argue why BO is a powerful tool for Robotics

Why Learning?

Engineering still heavily rely on human expertise !

On one hand, it is often unfeasible to hand-design complex systems

  • Human design is time-consuming and rely on prior expertise
  • Real-world experiments are expensive and stochastic

 

On the other hand, there is mistrust for automatic design

  • Not verifiable
  • Often find qualitatively different solutions
  • (Maybe a bit of human presumption)

Black-box Optimization

x^* = \text{arg min} \quad f(x)

Optimized parameters

Objective function

Parameters to optimize

{x\in R^d}

A Taxonomy of Objective Functions

Single minimum
(e.g., convex functions)

Multiple minimum
(a.k.a., global optimization)

First-order

(we can measure gradients)

Zero-order

(no gradients available)

Noise-less

(repeating the evaluation yield the same result)

Stochastic

(repeating the evaluation yield different results)

Nice and easy to solve
(e.g., with gradient descent)

Cheap Evaluation

(virtually infinite number of evaluations allowed)

Difficult to optimize

Expensive Evaluation

(limited to tens or hundreds of evaluations)

Here we want to use BO!

Some Applications of BO

  • Learn to Walk with robots
  • Learn to Fly with drones
  • Control of Ferrofluid Droplets
  • Design of micro-robots
  • Optimization of Bioreactors
  • Optimization of processes at Facebook
  • Optimization of Ball-bearings
  • ...

How does Bayesian
Optimization works?

Intuition Behind Bayesian Optimization

  • Many optimizers capture only local information about the objective function
  • Can we instead use all information (i.e., the evaluations) collected so far to make a more informed decision, hence improving data-efficiency?
  • How to do this in practice?
x_{t+1} = g(x_t, f(x_t))
x_{t+1} = x_t + \gamma g(\nabla f(x_t))

e.g.,

D =\{x_i, f(x_i)\}\,, i=1\ldots t
x_{t+1} = g(D)
\tilde{f}(x)|_{D} \sim f(x)

We can create a surrogate model

x^* = \text{arg max} \quad \tilde{f}(x)

Gradient descent

Bayesian Optimization

  • Learn response surface
  • Based on the response surface, select next parameters to evaluate
  • Evaluate
     on the objective function
  • Repeat until stop criteria
\tilde{f}(x)
x_{t+1}
x_{t+1}

[credit: Marc Deisenroth]

Bayesian Optimization

  • Learn response surface
  • Based on the response surface, select next parameters to evaluate
  • Evaluate
     on the objective function
  • Repeat until stop criteria
\tilde{f}(x)
x_{t+1}
x_{t+1}

[credit: Marc Deisenroth]

Response Surface

Large variety of models used throughout the literature:

  • Polynomial functions
  • Random forests
  • Bayesian neural networks
  • Gaussian processes
  • ...

The most commonly used (at the moment)

Surrogate model (a.k.a. response surface) need to accurately approximate (and generalize) the underlying function based on the available data

 
D =\{X, y\}\,, X=x_i\,, y=f(x_i)
y = \tilde{f}(X)\, \qquad \longrightarrow f(x) = \tilde{f}(x)

Gaussian Processes

Additional reading:

Rasmussen, C. E. & Williams, C. K. I.
Gaussian Processes for Machine Learning
The MIT Press, 2006

  • Distribution over functions
\tilde{f} \sim \text{GP}(m_f, k_f)
  • Probabilistic Model
p(\tilde{f}(x^*)|D, x^*)\sim N(\mu, \sigma^2)\,,
\mu|_{x^*} = k(X,x^*)^T k(X,X)^{-1} y\,,

Mean of a GP = Kernel ridge regression

\sigma^2|_{x^*} = k(x^*, x^*) - k(X, x^*)^T k(X, X)^{-1} k(X, x^*)
y = \tilde{f}(X) + \epsilon\,, \quad \epsilon\sim N(0,\sigma^2)
  • The posterior predictive distribution for an arbitrary input          is computed as:
x^*
  • Flexible Bayesian regression method

Intuition of Gaussian Processes

Covariance Functions and GP Training

k(x_i, x_j) = \sigma^2_f \text{exp} \large( - \frac{(x_i - x_j)^2}{2l^2}\large) + \delta_{ij} \sigma_n^2

Square exponential

parameters of the GP
(often referred to as hyperparameters)

Multiple ways to optimize the hyperparameters

  • MAP estimate (by optimizing marginal likelihood)
  • Numerical integration (proper Bayesian way, but often more complicated)

Additional reading:

Rasmussen, C. E. & Williams, C. K. I.
Gaussian Processes for Machine Learning
The MIT Press, 2006

Why Gaussian Processes?

Pro:

  • Mathematically well-understood
  • Calibrated uncertainties
  • Possibility of specifying priors (e.g., of the underlying function)
  • Easy to enforce Lipschitzian smoothness (by choosing appropriate kernel)
  • Good modeling capabilities in low-data regime

Cons:

  • Difficult to scale to high-dimensional input space
  • Computationally expensive
  • Quality of the model dependent from use of appropriate kernel
N(O^3)

Bayesian Optimization

  • Learn response surface
  • Based on the response surface, select next parameters to evaluate
  • Evaluate
     on the objective function
  • Repeat until stop criteria
\tilde{f}(x)
x_{t+1}
x_{t+1}

[credit: Marc Deisenroth]

Acquisition Function

  • How do we select the next parameters to evaluate?

 

 

  • Intuition: A good acquisition function               need to strike a smart balance between exploration and exploitation
    • Too much exploration, and we will keep trying parameters that are unlikely to perform well
    • Too much exploitation, and we might get stuck in a local minima
    • In either extremes, performance will suffer
  • The tradeoff between exploration and exploitation traditionally* happens through the mean and variance of the response surface

 

 

x^* = \text{arg min} \quad \alpha(\tilde{f}(x), D)
\alpha(\cdot)
y = \alpha(\mu(x^*), \sigma(x^*))

* But not always

Acquisition Functions

y = \mu(x^*) -\beta \sigma(x^*)
  • Many acquisition functions in the literature:
    • Probability of improvement [Kushner 1964]
    • Expected improvement [Mockus 1978]
    • Upper confidence bound
    • Entropy search
    • Predictive entropy search
    • ...
    • Ensembles of aquisition functions
  • No Golden bullet

Optimizing the Acquisition Function

x^* = \text{arg min} \quad \alpha(\tilde{f}(x), D)
{x\in R^d}
  • Optimizing the acquisition function is by itself a challenging optimization problem
  • What have we gained by converting the original optimization to this?
    • No longer stochastic
    • No longer zero-order
      (We can usually compute gradients and Hessian of the acquisition function)
    • Not expensive to compute
      (Does not require real-world evaluations. Although it might potentially be computationally intensive)
  • In theory, any global optimizer can be used to optimize the acquisition function
  • In practice, often used a global optimizer (e.g., CMA-ES or DIRECT) followed by a first-order optimizer (e.g., gradient descent)

Recap

Bayesian Optimization for Policy Search

\theta^* =\text{arg max}_\theta\, R[{\pi(\theta)}]
a_t = \pi(s_t, \theta)

Policy (i.e., parametrized controller)

Action executed

Learning a controller is equivalent to optimizing the parameters of the controller

Current state

Parameters of the policy

  • 0-order
  • Stochastic
  • Expensive evaluation

The Beginning

Learning to Walk with a Bipedal Robot

Bio-inspired Bipedal Robot "Fox":

  • Quasi-passive dynamic walker
  • 4 Degrees of freedom
  • Springs in legs
  • Walking in circle
  • Finite-state-machine controller (from biomechanics)
  • 8 open parameters
  • (Motors life ~200 trials)

Calandra, R.; Seyfarth, A.; Peters, J. & Deisenroth, M. P.
Bayesian Optimization for Learning Gaits under Uncertainty
Annals of Mathematics and Artificial Intelligence (AMAI), 2015, 76, 5-23

Learning to Walk in 80 Trials

Learning Curve

Calandra, R.; Seyfarth, A.; Peters, J. & Deisenroth, M. P.
Bayesian Optimization for Learning Gaits under Uncertainty
Annals of Mathematics and Artificial Intelligence (AMAI), 2015, 76, 5-23

Comparison

Calandra, R.; Seyfarth, A.; Peters, J. & Deisenroth, M. P.
Bayesian Optimization for Learning Gaits under Uncertainty
Annals of Mathematics and Artificial Intelligence (AMAI), 2015, 76, 5-23

Learned model

Not Symmetrical (about 5° difference). Why?

Because it is walking in a circle!

Calandra, R.; Seyfarth, A.; Peters, J. & Deisenroth, M. P.
Bayesian Optimization for Learning Gaits under Uncertainty
Annals of Mathematics and Artificial Intelligence (AMAI), 2015, 76, 5-23

Beyond Single Objective

Locomotion as Multi-objective Optimization

Trade-off between Walking Speed and Energy Consumption!

Multi-objective Optimization

  • Most engineering problems are truly multi-objective
{x\in R^d}
x^* = \text{arg min} \quad \{f_1(x),\ldots,f_n(x) \}

Pareto Front

  • Not all objective functions can be optimized at once
  • Solving this optimization means finding the
  • PF identifies should be:
    • Complete
    • Dense
    • Accurate

Predicting Pareto Front

20 Evaluations

50 Evaluations

200 Evaluations

Calandra, R.; Peters, J. & Deisenroth, M. P.
Pareto Front Modeling for Sensitivity Analysis in Multi-Objective Bayesian Optimization
NIPS Workshop on Bayesian Optimization (BayesOpt), 2014

Predicting Pareto Front

MOP2

ZDT3

Predicting Pareto Front (Noisy)

Calandra, R.; Peters, J. & Deisenroth, M. P.
Pareto Front Modeling for Sensitivity Analysis in Multi-Objective Bayesian Optimization
NIPS Workshop on Bayesian Optimization (BayesOpt), 2014

MOP2

ZDT3

Predicting Pareto Front (Noisy)

Sensitivity Analysis

Sensitivity Analysis (MOP2)

Calandra, R.; Peters, J. & Deisenroth, M. P.
Pareto Front Modeling for Sensitivity Analysis in Multi-Objective Bayesian Optimization
NIPS Workshop on Bayesian Optimization (BayesOpt), 2014

Sensitivity Analysis (RMTP3)

Learning to Walk with Micro-robots

Micro-robots

Simulated hexapod:

  • 12 Degrees of Freedom (2 per legs)
  • No good physics models at that scale
  • Central Pattern Generators (CPG) as controller

Let's apply all the tools we have so far!

Yang, B.; Wang, G.; Calandra, R.; Contreras, D.; Levine, S. & Pister, K.
Learning Flexible and Reusable Locomotion Primitives for a Microrobot 
IEEE Robotics and Automation Letters (RA-L), 2018, 3, 1904-1911

Hard-coded CPG Gaits

Yang, B.; Wang, G.; Calandra, R.; Contreras, D.; Levine, S. & Pister, K.
Learning Flexible and Reusable Locomotion Primitives for a Microrobot 
IEEE Robotics and Automation Letters (RA-L), 2018, 3, 1904-1911

Single-objective

Yang, B.; Wang, G.; Calandra, R.; Contreras, D.; Levine, S. & Pister, K.
Learning Flexible and Reusable Locomotion Primitives for a Microrobot 
IEEE Robotics and Automation Letters (RA-L), 2018, 3, 1904-1911

Dual Tripod Gait

Multi-objective

Yang, B.; Wang, G.; Calandra, R.; Contreras, D.; Levine, S. & Pister, K.
Learning Flexible and Reusable Locomotion Primitives for a Microrobot 
IEEE Robotics and Automation Letters (RA-L), 2018, 3, 1904-1911

Comparison Gaits

Yang, B.; Wang, G.; Calandra, R.; Contreras, D.; Levine, S. & Pister, K.
Learning Flexible and Reusable Locomotion Primitives for a Microrobot 
IEEE Robotics and Automation Letters (RA-L), 2018, 3, 1904-1911

Discovering New Gaits

Contextual Bayesian Optimization

x^* = \text{arg min} \quad f(x)

Optimized parameters

Objective function

Parameters to optimize

{x\in R^d}
x^* = \text{arg min} \quad f(x, c )

Context

Contextual BO

Yang, B.; Wang, G.; Calandra, R.; Contreras, D.; Levine, S. & Pister, K.
Learning Flexible and Reusable Locomotion Primitives for a Microrobot 
IEEE Robotics and Automation Letters (RA-L), 2018, 3, 1904-1911

Learning Locomotion Primitives

  • With 50 trials for each of the 5 goal targets, we can learn fairly accurate model
  • The trick was to consider it a contextual BO at training time, and then convert to MOO

Combining Primitives for Navigation

(More) Expensive Optimization

Joint Morphology/Controller Optimization

  • In Robotics, there is a tight relationship between morphologies and controllers
  • Design of morphologies is a complex and time-consuming process
  • Can we automate it?
  • Same simulated hexapod as before:
    • Each manufacturing round takes about 1 month in real-world...
    • ...But we can fabricate multiple different morphology configurations at once (up to 5)

Liao, T.; Wang, G.; Yang, B.; Lee, R.; Pister, K.; Levine, S. & Calandra, R.
Data-efficient Learning of Morphology and Controller for a Microrobot
IEEE International Conference on Robotics and Automation (ICRA), 2019

Hierarchical Process Constrained Batch Bayesian Optimization (HPC-BBO)

Two levels of optimization
(instead of a single bigger optimization)

  • Allows to weight the different cost of the two types of parameters
  • Each of the two levels uses information from the other level:
    • The morphology level consider the best policy achieved for each morphology design
    • The controller level uses the morphology as context
  • Batch evaluation to reduce fabrication time

Liao, T.; Wang, G.; Yang, B.; Lee, R.; Pister, K.; Levine, S. & Calandra, R.
Data-efficient Learning of Morphology and Controller for a Microrobot
IEEE International Conference on Robotics and Automation (ICRA), 2019

Results

Liao, T.; Wang, G.; Yang, B.; Lee, R.; Pister, K.; Levine, S. & Calandra, R.
Data-efficient Learning of Morphology and Controller for a Microrobot
IEEE International Conference on Robotics and Automation (ICRA), 2019

Top 4 Morphologies

  • Exchanging the morphology severely degrade the controller performance.
  • This evidence supports the hypothesis that morphology and controller need to be tightly coupled

Linear Embeddings for High Dimensional BO

High-dimensional BO with Linear Embeddings

Z. Wang, F. Hutter, M. Zoghi, D. Matheson, and N. de Freitas.
Bayesian optimization in a billion dimensions via random embeddings.
Journal of Artificial Intelligence Research, 55:361–387, 2016

Very neat Idea!

But several wrong assumptions...

A Few fixes

 Letham, B.; Calandra, R.; Rai, A. & Bakshy, E.

 Re-Examining Linear Embeddings for High-dimensional Bayesian Optimization

 Advances in Neural Information Processing Systems (NeurIPS), 2020

  • Linear projections do not preserve product kernels.
    • Mahalanobis Kernel
  • Most points in the embedding map to the facets of the projection
    • Constrain the embedding optimization to points within the bounds
  • Linear embeddings can have a low probability of containing an optimum.
    • Unit hypersphere sampling for the projection

Results

 Letham, B.; Calandra, R.; Rai, A. & Bakshy, E.

 Re-Examining Linear Embeddings for High-dimensional Bayesian Optimization

 Advances in Neural Information Processing Systems (NeurIPS), 2020

Overview

  • Bayesian Optimization & Applications
  • Reinforcement Learning
  • Large-scale Autonomous Data Collection

Reinforcement Learning (RL)

Reinforcement Learning Approaches

Model-free:

  • Local convergence guaranteed*

  • Simple to implement

  • Computationally light

  • Does not generalize

  • Data-inefficient

Model-based:

  • No convergence guarantees

  • Challenging to learn model

  • Computationally intensive

  • Data-efficient

  • Generalize to new tasks

Evidence from neuroscience that humans use both approaches! [Daw et al. 2010]

Model-based Reinforcement Learning

Probabilistic Ensembles with Trajectory Sampling (PETS)

Chua, K.; Calandra, R.; McAllister, R. & Levine, S.
Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
Advances in Neural Information Processing Systems (NIPS), 2018, 4754-4765

PETS - Experimental Results

Chua, K.; Calandra, R.; McAllister, R. & Levine, S.
Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
Advances in Neural Information Processing Systems (NIPS), 2018, 4754-4765

Chua, K.; Calandra, R.; McAllister, R. & Levine, S.
Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
Advances in Neural Information Processing Systems (NIPS), 2018, 4754-4765

Learning to Fly a Quadcopter

Lambert, N.O.; Drew, D.S.; Yaconelli, J; Calandra, R.; Levine, S.; & Pister, K.S.J.
Low Level Control of a Quadrotor with Deep Model-Based Reinforcement Learning
IEEE Robotics and Automation Letters (RA-L), 2019, 4, 4224-4230

On-line Adaptation to Different Payloads

Belkhale, S.; Li, R.; Kahn, G.; McAllister, R.; Calandra, R. & Levine, S.
Model-Based Meta-Reinforcement Learning for Flight with Suspended Payloads

IEEE Robotics and Automation Letters (RA-L), 2021, 6, 1471-1478

Lambeta, M.; Chou, P.-W.; Tian, S.; Yang, B.; Maloon, B.; Most, V. R.; Stroud, D.; Santos, R.; Byagowi, A.; Kammerer, G.; Jayaraman, D. & Calandra, R.
DIGIT: A Novel Design for a Low-Cost Compact High-Resolution Tactile Sensor with Application to In-Hand Manipulation
IEEE Robotics and Automation Letters (RA-L), 2020, 5, 3838-3845

Model-based Reinforcement Learning

Lambeta, M.; Chou, P.-W.; Tian, S.; Yang, B.; Maloon, B.; Most, V. R.; Stroud, D.; Santos, R.; Byagowi, A.; Kammerer, G.; Jayaraman, D. & Calandra, R.
DIGIT: A Novel Design for a Low-Cost Compact High-Resolution Tactile Sensor with Application to In-Hand Manipulation
IEEE Robotics and Automation Letters (RA-L), 2020, 5, 3838-3845

Understand and Overcome the Limitations of MBRL

  • Can we avoid the multiplicative error of recursive one-step predictions?

Lambert, N.; Wilcox, A.; Zhang, H.; Pister, K. S. J. & Calandra, R.
Learning Accurate Long-term Dynamics for Model-based Reinforcement Learning

IEEE Conference on Decision and Control (CDC), 2021, [available online: https://arxiv.org/abs/2012.09156]

(YES)

  • Can we dynamically tune the hyperparameters?

Zhang, B.; Rajan, R.; Pineda, L.; Lambert, N.; Biedenkapp, A.; Chua, K.; Hutter, F. & Calandra, R.
On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning
International Conference on Artificial Intelligence and Statistics (AISTATS), 2021

(YES)

  • Are accurate models condition necessary for good control performance?
  • Are accurate models condition sufficient for good control performance?

Bansal, S.; Calandra, R.; Xiao, T.; Levine, S. & Tomlin, C. J.
Goal-Driven Dynamics Learning via Bayesian Optimization
IEEE Conference on Decision and Control (CDC), 2017, 5168-5173

(NO)

(NO)

Lambert, N.; Amos, B.; Yadan, O. & Calandra, R.
Objective Mismatch in Model-based Reinforcement Learning
Learning for Dynamics and Control (L4DC), 2020, 761-770

1-Step Ahead Models and their Propagation

S_{t+h} = f_\theta(\ldots f_\theta(f_\theta(s_t,a_t), a_{t+1})\ldots, a_{t+h})
S_{t+h} = f_\theta(\ldots f_\theta(f_\theta(s_t,a_t)+\epsilon, a_{t+1})+\epsilon \ldots, a_{t+h})+\epsilon
S_{t+1} = f_\theta(s_t,a_t)

Multiplicative Error -- Doomed to accumulate

Trajectory Prediction

Lambert, N.; Wilcox, A.; Zhang, H.; Pister, K. S. J. & Calandra, R.
Learning Accurate Long-term Dynamics for Model-based Reinforcement Learning

IEEE Conference on Decision and Control (CDC), 2021, [available online: https://arxiv.org/abs/2012.09156]

Trajectory Prediction

S_{t+h} = f_\theta(\ldots f_\theta(f_\theta(s_t,a_t), a_{t+1})\ldots, a_{t+h})
S_{t+h} = f_\theta(s_t,a_t, a_{t+1}, \ldots, a_{t+h})
S_{t+h} = f_\theta(s_t,\theta_{\pi})
a_t = \pi_\theta(s_t)
\text{if}\, \theta << dim(a_t,\ldots, a_{t+h})\, \text{we win}
S_{t+h} = f_\theta(s_t, h, \theta_{\pi})

Advantages

  • Better accuracy for long horizons
  • Calibrated uncertainty over the whole trajectory
  • Better data efficiency
  • Faster computation/propagation for long-horizons
  • Continuous time
     

(from O(t) to O(1) for any given t)

Overview

  • Bayesian Optimization & Applications
  • Reinforcement Learning
  • Large-scale Autonomous Data Collection

Large-scale Autonomous Data Collection

  • Scale of Data is crucial for large Deep Learning models
  • How do we fully automatize experiments?
    How do we take the human out of the loop?
  • Careful experimental design is often encessary

 

Visuo-tactile Learned Model

Calandra, R.; Owens, A.; Jayaraman, D.; Yuan, W.; Lin, J.; Malik, J.; Adelson, E. H. & Levine, S.
More Than a Feeling: Learning to Grasp and Regrasp using Vision and Touch
IEEE Robotics and Automation Letters (RA-L), 2018, 3, 3300-3307

Self-supervised Data Collection

  • Setting:
    • 7-DOF Sawyer arm
    • Weiss WSG-50 Parallel gripper
    • one GelSight on each finger
    • Two RGB-D cameras in front and on top

 

  • (Almost) fully autonomous data collection:
    • Estimates the object position using depth, and perform a random grasp of the object.
    • Labels automatically generated by looking at the presence of contacts after each attempted lift

Examples of Training Objects

Collected 6450 grasps from over 60 training objects over ~2 weeks.

Calandra, R.; Owens, A.; Jayaraman, D.; Yuan, W.; Lin, J.; Malik, J.; Adelson, E. H. & Levine, S.
More Than a Feeling: Learning to Grasp and Regrasp using Vision and Touch
IEEE Robotics and Automation Letters (RA-L), 2018, 3, 3300-3307

Grasp Success on Unseen Objects

83.8% grasp success on 22 unseen objects
(using only vision yields 56.6% success rate)

A Few Lessons Learned (Across Multiple Projects)

  • Full automation is paramount
  • Think carefully about the experimental setup
  • Iterate the setup
  • Collect as much data as possible
  • Verify early that data are consistent

To Conclude

Human Collaborators

LASR Lab

Collaborators

Funding

Overview

  • BO is a powerful tool. Shown a few examples
    • Learning to walk with the bipedal robot "Fox"
    • Multi-objective BO for navigation with micro-robots
    • Hierarchical BO for joint morphology/controller optimization
  •  Reinforcement Learning
  • Automatic data-collect is highly desirable

Thank you!

Additional Slides

D3

By Roberto Calandra

D3

  • 13