Joint perception and control: Feedback from pixels

Russ Tedrake and Sadra Saddradini

(+ joint work w/ Guy and Hadas)

Image credit: Andy Biewener (Harvard)

When I started this MURI...

Good algorithms for feedback motion planning (developed largely in previous MURI)

 

Biggest limitation: assumed full-state feedback.

Vision-based control (Feedback from pixels)

Vision has become a primary sensor...

But the complexity of perception breaks our tools...

Sensor

Plant

Sensor

Sensor

Perception / Estimation

Planning & Control

  • Sensors include cameras ⇒ sensor model is a photo-realistic rendering engine
  • Perception components (especially) include deep neural networks
  • Plant model has to capture distributions over natural scenes (lighting conditions)

\(y\)

\(\hat{x}\)

\(u\)

\(x\)

Joint perception and control

Perception / Estimation

Planning & Control

\(y\)

\(p(x | history)\)

\(u\)

But is Deep RL the only viable approach?

Estimating the full state (or belief state) is unreasonable and unnecessary...

Output feedback

vs

\(y\)

\(u\)

aka "pixels-to-torques"

What we've learned: control theory still works

  • Dynamic output-feedback design (generalizes LQG)
    • Can still enforce safety constraints
    • No local minima
  • Pixel-space (RGB-D) is terrible; avoid it
  • Task-relevant models (in progress)

 

  • WIP w/ Guy and Hadas: better noise models for perception \(\Rightarrow\) tighter bounds for control/reachability analysis

Basic formulation

N.B. Ben Recht on feedback from pixels: "\(y=g(x), \) we assume \(g\) is invertible".  In my view, that's unreasonable.

\min \sum_n \ell(y_n, u_n),\\ \text{subject to } x_{n+1} = f(x_n, u_n, w_n), \, y_n = g(x_n, v_n)
  • This is a POMDP: general solution mostly intractable
  • LQG is a special case we can solve, when \(\ell\) is quadratic, \(f\) and \(g\) are linear, and \(w\) and \(v\) are Gaussian. Optimal solution decomposes into:
    • Kalman filter to estimate full state
    • LQR for full-state feedback
  • LQG solution typically via a Riccati equation

Generalizations of LQG

There are more general approaches to solving the LQG problem:

  • Generalized Riccati equations (e.g. from \(H_2\) and \(H_\infty\) design)
  • Convex re-parameterizations (e.g. LMI formulation from Scherer et al)

 

  1. Disturbance-based feedback parameterizations (Youla parameters, Systems-Level Synthesis)
    • Sadra showed disturbance-based feedback for ARX models (or Markov parameters) with robustness guarantees
  2. Gradient-descent for dynamic output feedback has no local minima (new results w/ Jack Umenberger + Tobia Marcucci + Pablo Parrilo)

 

 

These solve joint perception + control (not only state estimation \(\Rightarrow\) control)

Generalizations of LQG

Key idea: Don't solve the full POMDP, search over the restricted class of policies, e.g.

x_c[n+1] = A x_c[n] + B y[n],\\ u[n] = C x_c[n] + D y[n]
  • Analogous to DeepRL (e.g. with recurrent network policy).
  • In the linear+robust case we have mature solutions.
  • Many nonlinear extensions from state-feedback now apply.

Linear approach for "pixels-to-torques"

Linear approach for "pixels-to-torques"

  • Very high dimensional outputs
  • Data-driven with two least-squares programs! (model i.d. + output feedback LQR)

feedback gain at time 2

https://keypointnet.github.io/

https://nanonets.com/blog/human-pose-estimation-2d-guide/

Ex: Cart-pole balancing (from keypoints)

Dense Object Nets

Core technology: dense correspondences

(built on Schmidt, Newcombe, Fox, RA-L 2017)

Peter R. Florence*, Lucas Manuelli*, and Russ Tedrake. Dense Object Nets: Learning Dense Visual Object Descriptors By and For Robotic Manipulation. CoRL, 2018.

Dense Object Nets

dense 3D reconstruction

+ pixelwise contrastive loss

Learn descriptor keypoint dynamics + trajectory MPC

New results suggest models can be very simple (e.g. < 10 ReLUs)

Learn descriptor keypoint dynamics + trajectory MPC

Reachable sets w/ complex sensor models

Sadra, Guy, Hadas, Russ

Thrun, Burgard, Fox, Probabilistic Robotics

 

Key idea: Piecewise sensors models + integral quadratic constraints (IQCs) to bound # of switches

What we've learned: control theory still works

  • Dynamic output-feedback design (generalizes LQG)
    • Can still enforce safety constraints
    • No local minima
  • Pixel-space (RGB-D) is terrible; avoid it
  • Task-relevant models

 

  • WIP w/ Guy and Hadas: better noise models for perception \(\Rightarrow\) tighter bounds for control/reachability analysis

Bonus: Continuous control with symbols / contact

  • Vertices \(V\)
  • (Directed) edges \(E\)

 

  • For each \(i \in V:\)
    • Compact convex set \(X_i \subset \R^d\)
    • A point \(x_i \in X_i \) 

Down to the essence

  • Edge length given by a convex function \[ \ell(x_i, x_j) \]
  • Shortest path, \(P:\) \[ \min_P \min_{(x_i)_{i \in P}} \sum_{(i,j) \in P} \ell(x_i,x_j).\]

 

  • Can also add constraints on \(x_i, x_j\).

          is the convex relaxation.  (it's tight!)

Joint perception and control: Feedback from pixels

By russtedrake

Joint perception and control: Feedback from pixels

PERISCOPE MURI Review — short results talk

  • 400