AI for Decision Making
Under Uncertainty

Roberto Calandra

Perspectives in Data-Driven Materials Design Summer School - 28 August 2025

Learning, Adaptive Systems, and Robotics (LASR) Lab

Decision-Making
Optimization
Reinforcement Learning
Modeling dynamical systems

Control and planning
Dexterous manipulation
Locomotion

Hardware design
Touch Processing
Applications

Learning, Adaptive Systems, and Robotics (LASR) Lab

Machine Learning

Robotics

Touch
Sensing

First Experience with ML for Materials

Helsinki 2011
Summer job in Aki Vehtari's Lab
Gaussian Processes to predict continuous cooling transformation (CCT) diagrams
Unfortunately, not working very well
(too little data)

Overview

Bayesian Optimization & Applications
Reinforcement Learning
Large-scale Autonomous Data Collection

Overview

Bayesian Optimization & Applications
Reinforcement Learning
Large-scale Autonomous Data Collection

Goals of the talk

Explain some of the challenges in Robotics
Present multiple successful applications of BO in Robotics:
- Learning to walk with a bipedal robot
- Multi-objective BO for navigation with micro-robots
- Hierarchical BO for joint morphology/controller optimization
- High-dimensional BO with linear embeddings (done right)
Argue why BO is a powerful tool for Robotics

Why Learning?

Engineering still heavily rely on human expertise !

On one hand, it is often unfeasible to hand-design complex systems

Human design is time-consuming and rely on prior expertise
Real-world experiments are expensive and stochastic

On the other hand, there is mistrust for automatic design

Not verifiable
Often find qualitatively different solutions
(Maybe a bit of human presumption)

Black-box Optimization

x^* = \text{arg min} \quad f(x)

Optimized parameters

Objective function

Parameters to optimize

{x\in R^d}

A Taxonomy of Objective Functions

Single minimum
(e.g., convex functions)

Multiple minimum
(a.k.a., global optimization)

First-order

(we can measure gradients)

Zero-order

(no gradients available)

Noise-less

(repeating the evaluation yield the same result)

Stochastic

(repeating the evaluation yield different results)

Nice and easy to solve
(e.g., with gradient descent)

Cheap Evaluation

(virtually infinite number of evaluations allowed)

Difficult to optimize

Expensive Evaluation

(limited to tens or hundreds of evaluations)

Here we want to use BO!

Some Applications of BO

Learn to Walk with robots
Learn to Fly with drones
Control of Ferrofluid Droplets
Design of micro-robots
Optimization of Bioreactors
Optimization of processes at Facebook
Optimization of Ball-bearings
...

How does Bayesian
Optimization works?

Intuition Behind Bayesian Optimization

Many optimizers capture only local information about the objective function

Can we instead use all information (i.e., the evaluations) collected so far to make a more informed decision, hence improving data-efficiency?

How to do this in practice?

x_{t+1} = g(x_t, f(x_t))

x_{t+1} = x_t + \gamma g(\nabla f(x_t))

e.g.,

D =\{x_i, f(x_i)\}\,, i=1\ldots t

x_{t+1} = g(D)

\tilde{f}(x)|_{D} \sim f(x)

We can create a surrogate model

x^* = \text{arg max} \quad \tilde{f}(x)

Gradient descent

Bayesian Optimization

Learn response surface
Based on the response surface, select next parameters to evaluate
Evaluate
on the objective function
Repeat until stop criteria

\tilde{f}(x)

x_{t+1}

[credit: Marc Deisenroth]

Bayesian Optimization

Learn response surface
Based on the response surface, select next parameters to evaluate
Evaluate
on the objective function
Repeat until stop criteria

\tilde{f}(x)

x_{t+1}

[credit: Marc Deisenroth]

Response Surface

Large variety of models used throughout the literature:

Polynomial functions
Random forests
Bayesian neural networks
Gaussian processes
...

The most commonly used (at the moment)

Surrogate model (a.k.a. response surface) need to accurately approximate (and generalize) the underlying function based on the available data

D =\{X, y\}\,, X=x_i\,, y=f(x_i)

y = \tilde{f}(X)\, \qquad \longrightarrow f(x) = \tilde{f}(x)

Gaussian Processes

Additional reading:

Rasmussen, C. E. & Williams, C. K. I.
Gaussian Processes for Machine Learning
The MIT Press, 2006

Distribution over functions

\tilde{f} \sim \text{GP}(m_f, k_f)

Probabilistic Model

p(\tilde{f}(x^*)|D, x^*)\sim N(\mu, \sigma^2)\,,

\mu|_{x^*} = k(X,x^*)^T k(X,X)^{-1} y\,,

Mean of a GP = Kernel ridge regression

\sigma^2|_{x^*} = k(x^*, x^*) - k(X, x^*)^T k(X, X)^{-1} k(X, x^*)

y = \tilde{f}(X) + \epsilon\,, \quad \epsilon\sim N(0,\sigma^2)

The posterior predictive distribution for an arbitrary input is computed as:

x^*

Flexible Bayesian regression method

Intuition of Gaussian Processes

Covariance Functions and GP Training

k(x_i, x_j) = \sigma^2_f \text{exp} \large( - \frac{(x_i - x_j)^2}{2l^2}\large) + \delta_{ij} \sigma_n^2

Square exponential

parameters of the GP
(often referred to as hyperparameters)

Multiple ways to optimize the hyperparameters

MAP estimate (by optimizing marginal likelihood)
Numerical integration (proper Bayesian way, but often more complicated)

Additional reading:

Rasmussen, C. E. & Williams, C. K. I.
Gaussian Processes for Machine Learning
The MIT Press, 2006

Why Gaussian Processes?

Pro:

Mathematically well-understood
Calibrated uncertainties
Possibility of specifying priors (e.g., of the underlying function)
Easy to enforce Lipschitzian smoothness (by choosing appropriate kernel)
Good modeling capabilities in low-data regime

Cons:

Difficult to scale to high-dimensional input space
Computationally expensive
Quality of the model dependent from use of appropriate kernel

N(O^3)

Bayesian Optimization

Learn response surface
Based on the response surface, select next parameters to evaluate
Evaluate
on the objective function
Repeat until stop criteria

\tilde{f}(x)

x_{t+1}

[credit: Marc Deisenroth]

Acquisition Function

How do we select the next parameters to evaluate?

Intuition: A good acquisition function need to strike a smart balance between exploration and exploitation
- Too much exploration, and we will keep trying parameters that are unlikely to perform well
- Too much exploitation, and we might get stuck in a local minima
- In either extremes, performance will suffer
The tradeoff between exploration and exploitation traditionally* happens through the mean and variance of the response surface

x^* = \text{arg min} \quad \alpha(\tilde{f}(x), D)

\alpha(\cdot)

y = \alpha(\mu(x^*), \sigma(x^*))

* But not always

Acquisition Functions

y = \mu(x^*) -\beta \sigma(x^*)

Many acquisition functions in the literature:
- Probability of improvement [Kushner 1964]
- Expected improvement [Mockus 1978]
- Upper confidence bound
- Entropy search
- Predictive entropy search
- ...
- Ensembles of aquisition functions
No Golden bullet

Optimizing the Acquisition Function

x^* = \text{arg min} \quad \alpha(\tilde{f}(x), D)

{x\in R^d}

Optimizing the acquisition function is by itself a challenging optimization problem
What have we gained by converting the original optimization to this?
- No longer stochastic
- No longer zero-order
  (We can usually compute gradients and Hessian of the acquisition function)
- Not expensive to compute
  (Does not require real-world evaluations. Although it might potentially be computationally intensive)
In theory, any global optimizer can be used to optimize the acquisition function
In practice, often used a global optimizer (e.g., CMA-ES or DIRECT) followed by a first-order optimizer (e.g., gradient descent)

Recap

Bayesian Optimization for Policy Search

\theta^* =\text{arg max}_\theta\, R[{\pi(\theta)}]

a_t = \pi(s_t, \theta)

Policy (i.e., parametrized controller)

Action executed

Learning a controller is equivalent to optimizing the parameters of the controller

Current state

Parameters of the policy

0-order
Stochastic
Expensive evaluation

The Beginning

Learning to Walk with a Bipedal Robot

Bio-inspired Bipedal Robot "Fox":

Quasi-passive dynamic walker
4 Degrees of freedom
Springs in legs
Walking in circle
Finite-state-machine controller (from biomechanics)
8 open parameters
(Motors life ~200 trials)

Calandra, R.; Seyfarth, A.; Peters, J. & Deisenroth, M. P.
Bayesian Optimization for Learning Gaits under Uncertainty
Annals of Mathematics and Artificial Intelligence (AMAI), 2015, 76, 5-23

Learning to Walk in 80 Trials

Learning Curve

Calandra, R.; Seyfarth, A.; Peters, J. & Deisenroth, M. P.
Bayesian Optimization for Learning Gaits under Uncertainty
Annals of Mathematics and Artificial Intelligence (AMAI), 2015, 76, 5-23

Comparison

Calandra, R.; Seyfarth, A.; Peters, J. & Deisenroth, M. P.
Bayesian Optimization for Learning Gaits under Uncertainty
Annals of Mathematics and Artificial Intelligence (AMAI), 2015, 76, 5-23

Learned model

Not Symmetrical (about 5° difference). Why?

Because it is walking in a circle!

Calandra, R.; Seyfarth, A.; Peters, J. & Deisenroth, M. P.
Bayesian Optimization for Learning Gaits under Uncertainty
Annals of Mathematics and Artificial Intelligence (AMAI), 2015, 76, 5-23

Beyond Single Objective

Locomotion as Multi-objective Optimization

Trade-off between Walking Speed and Energy Consumption!

Multi-objective Optimization

Most engineering problems are truly multi-objective

{x\in R^d}

x^* = \text{arg min} \quad \{f_1(x),\ldots,f_n(x) \}

Pareto Front

Not all objective functions can be optimized at once
Solving this optimization means finding the
PF identifies should be:
- Complete
- Dense
- Accurate

Predicting Pareto Front

20 Evaluations

50 Evaluations

200 Evaluations

Calandra, R.; Peters, J. & Deisenroth, M. P.
Pareto Front Modeling for Sensitivity Analysis in Multi-Objective Bayesian Optimization
NIPS Workshop on Bayesian Optimization (BayesOpt), 2014

Predicting Pareto Front

MOP2

ZDT3

Predicting Pareto Front (Noisy)

Calandra, R.; Peters, J. & Deisenroth, M. P.
Pareto Front Modeling for Sensitivity Analysis in Multi-Objective Bayesian Optimization
NIPS Workshop on Bayesian Optimization (BayesOpt), 2014

MOP2

ZDT3

Predicting Pareto Front (Noisy)

Sensitivity Analysis

Sensitivity Analysis (MOP2)

Calandra, R.; Peters, J. & Deisenroth, M. P.
Pareto Front Modeling for Sensitivity Analysis in Multi-Objective Bayesian Optimization
NIPS Workshop on Bayesian Optimization (BayesOpt), 2014

Sensitivity Analysis (RMTP3)

Learning to Walk with Micro-robots

Micro-robots

Simulated hexapod:

12 Degrees of Freedom (2 per legs)
No good physics models at that scale
Central Pattern Generators (CPG) as controller

Let's apply all the tools we have so far!

Yang, B.; Wang, G.; Calandra, R.; Contreras, D.; Levine, S. & Pister, K.
Learning Flexible and Reusable Locomotion Primitives for a Microrobot
IEEE Robotics and Automation Letters (RA-L), 2018, 3, 1904-1911

Hard-coded CPG Gaits

Single-objective

Dual Tripod Gait

Multi-objective

Comparison Gaits

Discovering New Gaits

Contextual Bayesian Optimization

x^* = \text{arg min} \quad f(x)

Optimized parameters

Objective function

Parameters to optimize

{x\in R^d}

x^* = \text{arg min} \quad f(x, c )

Context

Contextual BO

Learning Locomotion Primitives

With 50 trials for each of the 5 goal targets, we can learn fairly accurate model
The trick was to consider it a contextual BO at training time, and then convert to MOO

Combining Primitives for Navigation

(More) Expensive Optimization

Joint Morphology/Controller Optimization

In Robotics, there is a tight relationship between morphologies and controllers
Design of morphologies is a complex and time-consuming process
Can we automate it?
Same simulated hexapod as before:
- Each manufacturing round takes about 1 month in real-world...
- ...But we can fabricate multiple different morphology configurations at once (up to 5)

Liao, T.; Wang, G.; Yang, B.; Lee, R.; Pister, K.; Levine, S. & Calandra, R.
Data-efficient Learning of Morphology and Controller for a Microrobot
IEEE International Conference on Robotics and Automation (ICRA), 2019

Hierarchical Process Constrained Batch Bayesian Optimization (HPC-BBO)

Two levels of optimization
(instead of a single bigger optimization)

Allows to weight the different cost of the two types of parameters
Each of the two levels uses information from the other level:
- The morphology level consider the best policy achieved for each morphology design
- The controller level uses the morphology as context
Batch evaluation to reduce fabrication time

Results

Top 4 Morphologies

Exchanging the morphology severely degrade the controller performance.
This evidence supports the hypothesis that morphology and controller need to be tightly coupled

Linear Embeddings for High Dimensional BO

High-dimensional BO with Linear Embeddings

Z. Wang, F. Hutter, M. Zoghi, D. Matheson, and N. de Freitas.
Bayesian optimization in a billion dimensions via random embeddings.
Journal of Artificial Intelligence Research, 55:361–387, 2016

Very neat Idea!

But several wrong assumptions...

A Few fixes

Letham, B.; Calandra, R.; Rai, A. & Bakshy, E.

Re-Examining Linear Embeddings for High-dimensional Bayesian Optimization

Advances in Neural Information Processing Systems (NeurIPS), 2020

Linear projections do not preserve product kernels.
- Mahalanobis Kernel
Most points in the embedding map to the facets of the projection
- Constrain the embedding optimization to points within the bounds
Linear embeddings can have a low probability of containing an optimum.
- Unit hypersphere sampling for the projection

Results

Letham, B.; Calandra, R.; Rai, A. & Bakshy, E.

Re-Examining Linear Embeddings for High-dimensional Bayesian Optimization

Advances in Neural Information Processing Systems (NeurIPS), 2020

Overview

Bayesian Optimization & Applications
Reinforcement Learning
Large-scale Autonomous Data Collection

Reinforcement Learning (RL)

Reinforcement Learning Approaches

Model-free:

Local convergence guaranteed*
Simple to implement
Computationally light
Does not generalize
Data-inefficient

Model-based:

No convergence guarantees
Challenging to learn model
Computationally intensive
Data-efficient
Generalize to new tasks

Evidence from neuroscience that humans use both approaches! [Daw et al. 2010]

Model-based Reinforcement Learning

Probabilistic Ensembles with Trajectory Sampling (PETS)

Chua, K.; Calandra, R.; McAllister, R. & Levine, S.
Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
Advances in Neural Information Processing Systems (NIPS), 2018, 4754-4765

PETS - Experimental Results

Learning to Fly a Quadcopter

Lambert, N.O.; Drew, D.S.; Yaconelli, J; Calandra, R.; Levine, S.; & Pister, K.S.J.
Low Level Control of a Quadrotor with Deep Model-Based Reinforcement Learning
IEEE Robotics and Automation Letters (RA-L), 2019, 4, 4224-4230

On-line Adaptation to Different Payloads

Belkhale, S.; Li, R.; Kahn, G.; McAllister, R.; Calandra, R. & Levine, S.
Model-Based Meta-Reinforcement Learning for Flight with Suspended Payloads

IEEE Robotics and Automation Letters (RA-L), 2021, 6, 1471-1478

Lambeta, M.; Chou, P.-W.; Tian, S.; Yang, B.; Maloon, B.; Most, V. R.; Stroud, D.; Santos, R.; Byagowi, A.; Kammerer, G.; Jayaraman, D. & Calandra, R.
DIGIT: A Novel Design for a Low-Cost Compact High-Resolution Tactile Sensor with Application to In-Hand Manipulation
IEEE Robotics and Automation Letters (RA-L), 2020, 5, 3838-3845

Model-based Reinforcement Learning

Understand and Overcome the Limitations of MBRL

Can we avoid the multiplicative error of recursive one-step predictions?

Lambert, N.; Wilcox, A.; Zhang, H.; Pister, K. S. J. & Calandra, R.
Learning Accurate Long-term Dynamics for Model-based Reinforcement Learning
IEEE Conference on Decision and Control (CDC), 2021, [available online: https://arxiv.org/abs/2012.09156]

(YES)

Can we dynamically tune the hyperparameters?

Zhang, B.; Rajan, R.; Pineda, L.; Lambert, N.; Biedenkapp, A.; Chua, K.; Hutter, F. & Calandra, R.
On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning
International Conference on Artificial Intelligence and Statistics (AISTATS), 2021

(YES)

Are accurate models condition necessary for good control performance?

Are accurate models condition sufficient for good control performance?

Bansal, S.; Calandra, R.; Xiao, T.; Levine, S. & Tomlin, C. J.
Goal-Driven Dynamics Learning via Bayesian Optimization
IEEE Conference on Decision and Control (CDC), 2017, 5168-5173

(NO)

Lambert, N.; Amos, B.; Yadan, O. & Calandra, R.
Objective Mismatch in Model-based Reinforcement Learning
Learning for Dynamics and Control (L4DC), 2020, 761-770

1-Step Ahead Models and their Propagation

S_{t+h} = f_\theta(\ldots f_\theta(f_\theta(s_t,a_t), a_{t+1})\ldots, a_{t+h})

S_{t+h} = f_\theta(\ldots f_\theta(f_\theta(s_t,a_t)+\epsilon, a_{t+1})+\epsilon \ldots, a_{t+h})+\epsilon

S_{t+1} = f_\theta(s_t,a_t)

Multiplicative Error -- Doomed to accumulate

Trajectory Prediction

Lambert, N.; Wilcox, A.; Zhang, H.; Pister, K. S. J. & Calandra, R.
Learning Accurate Long-term Dynamics for Model-based Reinforcement Learning
IEEE Conference on Decision and Control (CDC), 2021, [available online: https://arxiv.org/abs/2012.09156]

Trajectory Prediction

S_{t+h} = f_\theta(\ldots f_\theta(f_\theta(s_t,a_t), a_{t+1})\ldots, a_{t+h})

S_{t+h} = f_\theta(s_t,a_t, a_{t+1}, \ldots, a_{t+h})

S_{t+h} = f_\theta(s_t,\theta_{\pi})

a_t = \pi_\theta(s_t)

\text{if}\, \theta << dim(a_t,\ldots, a_{t+h})\, \text{we win}

S_{t+h} = f_\theta(s_t, h, \theta_{\pi})

Advantages

Better accuracy for long horizons
Calibrated uncertainty over the whole trajectory
Better data efficiency
Faster computation/propagation for long-horizons
Continuous time

(from O(t) to O(1) for any given t)

Overview

Bayesian Optimization & Applications
Reinforcement Learning
Large-scale Autonomous Data Collection

Large-scale Autonomous Data Collection

Scale of Data is crucial for large Deep Learning models
How do we fully automatize experiments?
How do we take the human out of the loop?
Careful experimental design is often encessary

Visuo-tactile Learned Model

Calandra, R.; Owens, A.; Jayaraman, D.; Yuan, W.; Lin, J.; Malik, J.; Adelson, E. H. & Levine, S.
More Than a Feeling: Learning to Grasp and Regrasp using Vision and Touch
IEEE Robotics and Automation Letters (RA-L), 2018, 3, 3300-3307

Self-supervised Data Collection

Setting:
- 7-DOF Sawyer arm
- Weiss WSG-50 Parallel gripper
- one GelSight on each finger
- Two RGB-D cameras in front and on top

(Almost) fully autonomous data collection:
- Estimates the object position using depth, and perform a random grasp of the object.
- Labels automatically generated by looking at the presence of contacts after each attempted lift

Examples of Training Objects

Collected 6450 grasps from over 60 training objects over ~2 weeks.

Grasp Success on Unseen Objects

83.8% grasp success on 22 unseen objects
(using only vision yields 56.6% success rate)

A Few Lessons Learned (Across Multiple Projects)

Full automation is paramount
Think carefully about the experimental setup
Iterate the setup
Collect as much data as possible
Verify early that data are consistent

To Conclude

Human Collaborators

LASR Lab

Collaborators

Funding

Overview

BO is a powerful tool. Shown a few examples
- Learning to walk with the bipedal robot "Fox"
- Multi-objective BO for navigation with micro-robots
- Hierarchical BO for joint morphology/controller optimization
Reinforcement Learning
Automatic data-collect is highly desirable

Thank you!

Additional Slides

D3

By Roberto Calandra

D3

Roberto Calandra PRO

Full Professor at TU Dresden. Head of the LASR Lab. Working in AI, Robotics and Touch Sensing.

AI for Decision Making Under Uncertainty

Learning, Adaptive Systems, and Robotics (LASR) Lab

First Experience with ML for Materials

Overview

Overview

Goals of the talk

Why Learning?

Black-box Optimization

A Taxonomy of Objective Functions

Some Applications of BO

How does Bayesian Optimization works?

Intuition Behind Bayesian Optimization

Bayesian Optimization

Bayesian Optimization

Response Surface

Gaussian Processes

Intuition of Gaussian Processes

Covariance Functions and GP Training

Why Gaussian Processes?

Bayesian Optimization

Acquisition Function

Acquisition Functions

Optimizing the Acquisition Function

Recap

Bayesian Optimization for Policy Search

The Beginning

Learning to Walk with a Bipedal Robot

Learning to Walk in 80 Trials

Learning Curve

Comparison

Learned model

Beyond Single Objective

Locomotion as Multi-objective Optimization

Multi-objective Optimization

Predicting Pareto Front

Predicting Pareto Front

Predicting Pareto Front (Noisy)

Predicting Pareto Front (Noisy)

Sensitivity Analysis

Sensitivity Analysis (MOP2)

Sensitivity Analysis (RMTP3)

Learning to Walk with Micro-robots

Micro-robots

Hard-coded CPG Gaits

Single-objective

Dual Tripod Gait

Multi-objective

Comparison Gaits

Discovering New Gaits

Contextual Bayesian Optimization

Contextual BO

Learning Locomotion Primitives

Combining Primitives for Navigation

(More) Expensive Optimization

Joint Morphology/Controller Optimization

Hierarchical Process Constrained Batch Bayesian Optimization (HPC-BBO)

Results

Top 4 Morphologies

Linear Embeddings for High Dimensional BO

High-dimensional BO with Linear Embeddings

A Few fixes

Results

Overview

Reinforcement Learning (RL)

Reinforcement Learning Approaches

Model-free:

Model-based:

Model-based Reinforcement Learning

Probabilistic Ensembles with Trajectory Sampling (PETS)

PETS - Experimental Results

Learning to Fly a Quadcopter

On-line Adaptation to Different Payloads

Model-based Reinforcement Learning

Understand and Overcome the Limitations of MBRL

1-Step Ahead Models and their Propagation

Trajectory Prediction

Trajectory Prediction

Advantages

Overview

Large-scale Autonomous Data Collection

AI for Decision Making
Under Uncertainty

How does Bayesian
Optimization works?