Deep vs Physics?

Russ Tedrake

Galileo, Kepler, Newton, Coulomb, Hooke...

were data scientists.

They fit very simple models to very noisy data.

Gave us a rich class of parametric models that we could fit to new data.

What if Newton had deep learning...?

Galileo's notes on projectile motion

What is a (dynamic) model?

System

..., u_{-1}, u_0, u_1, ...
..., y_{-1}, y_0, y_1, ...

State-space

Auto-regressive (eg. ARMAX)

input

output

p(\theta, x_0, w_0, w_1, ...)
x_{n+1} = f(n, x_n, u_n, w_n, \theta) \\ \quad y_n = g(n, x_n, u_n, w_n, \theta)
y_{n+1} = f(n, u_n, u_{n-1}, ..., \\ \qquad \qquad y_n, y_{n-1}, ..., \\ \qquad \qquad w_n, w_{n-1}, ..., \theta)

state

noise/disturbances

parameters

What is a (dynamic) model?

System

..., u_{-1}, u_0, u_1, ...
..., y_{-1}, y_0, y_1, ...

Auto-regressive (eg. ARMAX)

Lagrangian mechanics,

Recurrent neural networks (e.g. LSTM), ...

Feed-forward networks (e.g. \(y_n\)= image)

input

output

State-space

x_{n+1} = f(n, x_n, u_n, w_n, \theta) \\ \quad y_n = g(n, x_n, u_n, w_n, \theta)
y_{n+1} = f(n, u_n, u_{n-1}, ..., \\ \qquad \qquad y_n, y_{n-1}, ..., \\ \qquad \qquad w_n, w_{n-1}, ..., \theta)

Models come in many forms

..., u_{-1}, u_0, u_1, ...
..., y_{-1}, y_0, y_1, ...

input

cost-to-go

Q-functions and value functions are models, too.  They try to predict only one output (the cost-to-go)

\[ Q^{\pi}(n, x_n,u_n, \theta) \]

Model "class" vs model "instance"

  • \(f\) and \(g\) describe the model class.
  • with \(\theta\) describes a model instance.

 

Today's discussion is about model class:

Should we prefer writing \(f\) and \(g\) using physics or deep networks?

Maybe not so different from

  • should we use ReLU or \(\tanh\)?
  • should we use LSTMs or Transformers?
x_{n+1} = f(n, x_n, u_n, w_n, \theta) \\ \quad y_n = g(n, x_n, u_n, w_n, \theta)

Aside: "Differentiable Physics"

Our physics models are (and have always been) differentiable.

 

You don't need neural networks.  Just the chain rule.

What makes a model useful?

"All models are wrong, but some are useful" -- George Box

Use case: Simulation

e.g., for

  • generating synthetic training data
  • Monte-Carlo testing
  • model-based development

 

What makes a model useful?

  • Reasonable: \( y_{sim}(u) \in \{ y_{real}(u) \} \)
  • Coverage: \( \forall y_{real}(u), \exists y_{sim}(u) \)
  • Accuracy: \( p(y_{real} | u) \approx p(y_{sim} | u) \)
  • Corollary:  Reliable system identification (data \( \Rightarrow \theta \))
  • Generalizable, efficient, repeatable, interpretable/debuggable, ... 

Use case: Online Decision Making (Planning/Control)

What makes a model useful?

  • Reasonable, Accurate, Generalizable, ...
  • Efficient / compact
  • Observable
  • Task-relevant

State-space models tend to be more efficient/compact, but require state estimation.

  • Ex: chopping onions.
    • Lagrangian state not observable.
    • Task relevant?
  • Doesn't imply "no mechanics"
x_{n+1} = f(n, x_n, u_n, w_n, \theta) \\ \quad y_n = g(n, x_n, u_n, w_n, \theta)
y_{n+1} = f(n, u_n, u_{n-1}, ..., \\ \qquad \qquad y_n, y_{n-1}, ..., \\ \qquad \qquad w_n, w_{n-1}, ..., \theta)

vs.

Auto-regressive

State-space

Occam's razor and predicting what won't happen

Perhaps the biggest philosophical difference between traditional physics models and "universal approximators".

  • \( f,g\) are not arbitrary.  Mechanics gives us constraints:
    • Conservation of mass
    • Conservation of energy
    • Maximum dissipation
    • ...
  • Arguably these constraints give our models their structure
    • Control affine
    • Inertial matrix is positive definite
    • Inverse dynamics have "branch-induced sparsity"
  • Without structure, maybe we can only ever do stochastic gradient descent...?

The failings of our physics-based models are mostly due to the unreasonable burden of estimating the "Lagrangian state" and parameters.

For e.g. onions, laundry, peanut butter, ...

The failings of our deep models are mostly due to our inability to due efficient/reliable planning, control design and analysis.

I want the next Newton to come around and to work on onions, laundry, peanut butter...

Potentially useful vocabulary

x_{n+1} = f(n, x_n, u_n, w_n, \theta) \\ \quad y_n = g(n, x_n, u_n, w_n, \theta)
  • simulation is solving for \(x\) given \(x_0, u\), and \(\theta\).
  • planning is searching for \(u,x\) given \(x_0, \theta\).  Can be optimistic (\(w=0\)) or stochastic/robust.
  • state estimation is searching for \(x,w\)
  • system identification is searching for \( \theta, w \)
  • stability analysis is e.g. finding \(X\), for which \( x_0 \in X \Rightarrow lim_{n\rightarrow \infty} x_n = 0 \)
  • verification / falsification asks if there \(\exists w \) such that \(\exists n, x_n \in \) failure set. 

IFRR Deep Physics

By russtedrake

IFRR Deep Physics

For participation in the panel of the IFRR Colloquium on the Roles of Physics-Based Models and Data-Driven Learning in Robotics http://ifrr.org/physics-based-data-driven-robotics

  • 781