Joint perception and control: Feedback from pixels

Russ Tedrake and Sadra Saddradini

(+ joint work w/ Guy and Hadas)

Image credit: Andy Biewener (Harvard)

When I started this MURI...

Good algorithms for feedback motion planning (developed largely in previous MURI)

Biggest limitation: assumed full-state feedback.

Vision-based control (Feedback from pixels)

Vision has become a primary sensor...

But the complexity of perception breaks our tools...

Sensor

Plant

Sensor

Perception / Estimation

Planning & Control

Sensors include cameras ⇒ sensor model is a photo-realistic rendering engine
Perception components (especially) include deep neural networks
Plant model has to capture distributions over natural scenes (lighting conditions)

\(y\)

\(\hat{x}\)

\(u\)

\(x\)

Joint perception and control

Perception / Estimation

Planning & Control

\(y\)

\(p(x | history)\)

\(u\)

But is Deep RL the only viable approach?

Estimating the full state (or belief state) is unreasonable and unnecessary...

Output feedback

\(y\)

\(u\)

aka "pixels-to-torques"

What we've learned: control theory still works

Dynamic output-feedback design (generalizes LQG)
- Can still enforce safety constraints
- No local minima
Pixel-space (RGB-D) is terrible; avoid it
Task-relevant models (in progress)

WIP w/ Guy and Hadas: better noise models for perception \(\Rightarrow\) tighter bounds for control/reachability analysis

Basic formulation

N.B. Ben Recht on feedback from pixels: "\(y=g(x), \) we assume \(g\) is invertible". In my view, that's unreasonable.

\min \sum_n \ell(y_n, u_n),\\ \text{subject to } x_{n+1} = f(x_n, u_n, w_n), \, y_n = g(x_n, v_n)

This is a POMDP: general solution mostly intractable
LQG is a special case we can solve, when \(\ell\) is quadratic, \(f\) and \(g\) are linear, and \(w\) and \(v\) are Gaussian. Optimal solution decomposes into:
- Kalman filter to estimate full state
- LQR for full-state feedback
LQG solution typically via a Riccati equation

Generalizations of LQG

There are more general approaches to solving the LQG problem:

Generalized Riccati equations (e.g. from \(H_2\) and \(H_\infty\) design)
Convex re-parameterizations (e.g. LMI formulation from Scherer et al)

Disturbance-based feedback parameterizations (Youla parameters, Systems-Level Synthesis)
- Sadra showed disturbance-based feedback for ARX models (or Markov parameters) with robustness guarantees
Gradient-descent for dynamic output feedback has no local minima (new results w/ Jack Umenberger + Tobia Marcucci + Pablo Parrilo)

These solve joint perception + control (not only state estimation \(\Rightarrow\) control)

Generalizations of LQG

Key idea: Don't solve the full POMDP, search over the restricted class of policies, e.g.

x_c[n+1] = A x_c[n] + B y[n],\\ u[n] = C x_c[n] + D y[n]

Analogous to DeepRL (e.g. with recurrent network policy).
In the linear+robust case we have mature solutions.
Many nonlinear extensions from state-feedback now apply.

Linear approach for "pixels-to-torques"

Very high dimensional outputs
Data-driven with two least-squares programs! (model i.d. + output feedback LQR)

feedback gain at time 2

https://keypointnet.github.io/

https://nanonets.com/blog/human-pose-estimation-2d-guide/

Ex: Cart-pole balancing (from keypoints)

Dense Object Nets

Core technology: dense correspondences

(built on Schmidt, Newcombe, Fox, RA-L 2017)

Peter R. Florence*, Lucas Manuelli*, and Russ Tedrake. Dense Object Nets: Learning Dense Visual Object Descriptors By and For Robotic Manipulation. CoRL, 2018.

Dense Object Nets

dense 3D reconstruction

+ pixelwise contrastive loss

Learn descriptor keypoint dynamics + trajectory MPC

New results suggest models can be very simple (e.g. < 10 ReLUs)

Learn descriptor keypoint dynamics + trajectory MPC

Reachable sets w/ complex sensor models

Sadra, Guy, Hadas, Russ

Thrun, Burgard, Fox, Probabilistic Robotics

Key idea: Piecewise sensors models + integral quadratic constraints (IQCs) to bound # of switches

What we've learned: control theory still works

Dynamic output-feedback design (generalizes LQG)
- Can still enforce safety constraints
- No local minima
Pixel-space (RGB-D) is terrible; avoid it
Task-relevant models

WIP w/ Guy and Hadas: better noise models for perception \(\Rightarrow\) tighter bounds for control/reachability analysis

Bonus: Continuous control with symbols / contact

Vertices \(V\)
(Directed) edges \(E\)

For each \(i \in V:\)
- Compact convex set \(X_i \subset \R^d\)
- A point \(x_i \in X_i \)

Down to the essence

Edge length given by a convex function \[ \ell(x_i, x_j) \]
Shortest path, \(P:\) \[ \min_P \min_{(x_i)_{i \in P}} \sum_{(i,j) \in P} \ell(x_i,x_j).\]

Can also add constraints on \(x_i, x_j\).

is the convex relaxation. (it's tight!)

Joint perception and control: Feedback from pixels

By russtedrake

Joint perception and control: Feedback from pixels

PERISCOPE MURI Review — short results talk

russtedrake PRO

Roboticist at MIT and TRI

people.csail.mit.edu/russt

Joint perception and control: Feedback from pixels

When I started this MURI...

Vision-based control (Feedback from pixels)

But the complexity of perception breaks our tools...

Joint perception and control

What we've learned: control theory still works

Basic formulation

Generalizations of LQG

Generalizations of LQG

Linear approach for "pixels-to-torques"

Linear approach for "pixels-to-torques"

Ex: Cart-pole balancing (from keypoints)

Dense Object Nets

Dense Object Nets

Reachable sets w/ complex sensor models

What we've learned: control theory still works

Bonus: Continuous control with symbols / contact

Down to the essence

Joint perception and control: Feedback from pixels

More from russtedrake