CS 4/5789: Introduction to Reinforcement Learning

Lecture 9: Local Approximations for Control

Prof. Sarah Dean

MW 2:55-4:10pm
255 Olin Hall

Reminders

  • Homework
    • PSet 3 due Wednesday
    • PA 2 released tonight, due 3/6
  • First exam is Monday 3/4 during lecture
    • Post on Ed if conflicts (makeup is 3/1)
    • TA led review session in class on Wed 2/28
  • February break Mon/Tues 2/26-7: no lecture, office hours

Agenda

1. Recap & Example

2. Calculus Review

3. Multivariate Approximations

4. Preview: Nonlinear Control

Recap: Optimal Control

  • Continuous \(\mathcal S = \mathbb R^{n_s}\) and \(\mathcal A = \mathbb R^{n_a}\)
  • Cost to be minimized \(c=(c_0,\dots, c_{H-1}, c_H)\)
  • Deterministic transitions described by dynamics function $$s_{t+1} = f(s_t, a_t)$$
  • Finite horizon \(H\)

\(\mathcal M = \{\mathcal{S}, \mathcal{A}, c, f, H\}\)

minimize   \(\displaystyle\sum_{t=0}^{H-1} c_t(s_t, a_t)+c_H(s_H)\)

s.t.   \(s_{t+1}=f(s_t, a_t), ~~a_t=\pi_t(s_t)\)

\(\pi\)

 

DP Algorithm: \(V^\star_{H}(s)=c_H(s)\)$$V_{t}^\star(s) =\min_a  c(s,a)+V^\star_{t+1}(f(s,a))$$

 

Example

image/svg+xml
  • Setting: hovering UAV over a target
  • Action: thrust right/left
    • imperfect: attenuated at high thrusts and velocities
  • The dynamics:
    • \(\mathsf{position}_{t+1} = \mathsf{position}_{t}+ \mathsf{velocity}_{t}\)
    • \(\mathsf{velocity}_{t+1}=\mathsf{velocity}_{t} + e^{- (\mathsf{velocity}_t^2+a_t^2)} a_t\)
  • When velocity/thrust is:
    • small, then \(\mathsf{velocity}_{t+1}\approx \mathsf{velocity}_{t} +a_t \)
    • large, then \(\mathsf{velocity}_{t+1}\approx \mathsf{velocity}_{t} \)

\(a_t\)

image/svg+xml f(x) x μ

Example

image/svg+xml
  • Setting: hovering UAV over a target
  • Action: thrust right/left
    • imperfect: attenuated at high thrusts and velocities
  • Goal: stay near target position \(0\)
    • Field of view is limited
    • Thus cost is $$c(s,a) =(1-e^{-\mathsf{pos}^2}) +\lambda a^2$$

\(a_t\)

image/svg+xml f(x) x μ

Difficulty of Nonlinear DP

  • Recall DP algorithm: \(V^\star_{H}(s)=c_H(s)\) and then $$V_{t}^\star(s) =\min_a  c(s,a)+V^\star_{t+1}(f(s,a))$$
    • policy determined by argmin
  • For UAV example, quickly get expressions like $$(1-e^{-(\mathsf{pos}+2 \mathsf{vel} + e^{- (\mathsf{vel}^2+a^2)} a)^2}) $$

Agenda

1. Recap & Example

2. Calculus Review

3. Multivariate Approximations

4. Preview: Nonlinear Control

  • Recall the definition of a derivative for a scalar function \(g\) $$g'(x) =\lim_{\delta\to 0} \frac{g(x+\delta)-g(x-\delta)}{2\delta}$$

Approximating Derivatives

  • Rather than compute exact derivatives, we will use a general purpose computational approximation
  • The finite difference approximation of a derivative for some small \(\delta\neq 0\) is $$g'(x) \approx \frac{g(x+\delta)-g(x-\delta)}{2\delta}$$

Approximations Using Derivatives

  • Can we approximate complicated functions with something simpler (e.g. linear or quadratic)?
  • For a differentiable function \(g:\mathbb R\to\mathbb R\)
    • Recall Taylor Expansion $$ g(x) = g(x_0) +g'(x_0)(x-x_0)+\frac{1}{2}g''(x_0)(x-x_0)^2 + ... $$
    • When \(x\) is close to \(x_0\), the higher order terms become vanishingly small: \(\epsilon^p\to 0\) as \(p\to\infty\) for small \( |\epsilon|<1\)

Approximations Using Derivatives

  • Can we approximate complication functions with something simpler (e.g. linear or quadratic)?
  • For a differentiable function \(g:\mathbb R\to\mathbb R\)
  • A first-order approximation around \(x_0\) is $$g(x) \approx g(x_0) +g'(x_0)(x-x_0)$$
  • A second-order approximation around \(x_0\) is $$g(x) \approx g(x_0) +g'(x_0)(x-x_0)+\frac{1}{2}g''(x_0)(x-x_0)^2$$
Parabola
  • For a function of \(n\) variables \(g:\mathbb R^n\to \mathbb R\), the partial derivative: $$\frac{\partial g (x)}{\partial x_i} = \lim_{\delta\to0} \frac{g(x_1,\dots,x_i+\delta,\dots,x_n)-g(x_1,\dots,x_i-\delta,\dots,x_n)}{2\delta} $$
    • equivalently in vector notation $$ = \lim_{\delta\to0} \frac{g(x+\delta e_i)-g(x-\delta e_i)}{2\delta} $$
  • The gradient \(\nabla g(x)\in\mathbb R^n\) is a vector containing all partials
  • Second derivatives for all pairs \(i\) and \(j\) contained in hessian $$\nabla^2 g(x)\in\mathbb R^{n\times n},\quad \nabla^2 g(x)_{ij} =\frac{\partial^2 g (x)}{\partial x_i \partial x_j}$$

Multi-variate Functions

PollEv

Vector-valued Functions

  • For a function of \(n\) variables and \(m\) dimensions \(g:\mathbb R^n\to \mathbb R^m\), there are \(n\times m\) partial derivatives (each input and output)
  • The Jacobian \(\nabla g(x)\in\mathbb R^{n\times m}\) generalizes the gradient and contains \(\frac{\partial g_j(x)}{\partial x_i}\) in row \(i\) and column \(j\)

\( \frac{\partial g_j (x)}{\partial x_i}\)

\(i\)

\(j\)

Finite Difference Approximation

  • For scalar function $$g'(x) \approx \frac{g(x+\delta)-g(x-\delta)}{2\delta}$$
  • For multivariate $$  \frac{\partial g (x)}{\partial x_i} \approx \frac{g(x+\delta e_i)-g(x-\delta e_i)}{2\delta}$$ where \(e_i\) is a standard basis vector
  • For second derivatives, repeat

$$\frac{\partial g (x)}{\partial x_i \partial x_j} \approx  \frac{1}{2\delta}\Big[ \frac{g(x+\delta e_j+\delta e_i)- g(x-\delta e_j+\delta e_i)}{2\delta} \\-  \frac{g(x+\delta e_j -\delta e_i)-g(x-\delta e_j -\delta e_i)}{2\delta} \Big]$$

$$\frac{\partial g (x)}{\partial x_i \partial x_j} \approx  \frac{1}{2\delta}\Big[ \frac{\partial g (x+\delta e_i)}{\partial x_j} - \frac{\partial g (x -\delta e_i)}{\partial x_j} \Big]$$

Agenda

1. Recap & Example

2. Calculus Review

3. Multivariate Approximations

4. Preview: Nonlinear Control

Linear Approximation

  • Linear, also called first-order, approximation $$ g(x) \approx g(x_0) + g'(x_0)(x-x_0) $$
  • For the dynamics  function \(f:\mathbb R^{n_s}\times \mathbb R^{n_a} \to \mathbb R^{n_s}\) $$ f(s,a) \approx f(s_0, a_0) + \nabla_s f(s_0, a_0)^\top (s-s_0) + \nabla_a f(s_0, a_0)^\top (a-a_0) $$
  • Jacobians \( \nabla_s f(s, a) \in\mathbb R^{n_s\times n_s}\) and \( \nabla_a f(s, a) \in\mathbb R^{n_a\times n_s}\) contain:
  • row \(i\) represents effects of \(i\)th dimension of current state/action, col \(j\) represents effects on \(f_j\), i.e. \(j\)th dimension of next state

\( \frac{\partial f_j (s,a)}{\partial s_i}\)

\(i\)

\(j\)

\( \frac{\partial f_j (s,a)}{\partial a_i}\)

\(i\)

\(j\)

Example

image/svg+xml
  • Setting: hovering UAV over a target
    • state \(s = [\mathsf{pos}, \mathsf{vel}]\)
  • The dynamics: $$ f(s_t, a_t) = \begin{bmatrix} \mathsf{pos}_{t}+ \mathsf{vel}_{t}\\ \mathsf{vel}_{t} + e^{- (\mathsf{vel}_t^2+a_t^2)} a_t \end{bmatrix}\qquad $$
  • \(= \begin{bmatrix} 1 & 0 \\  1 & 1-2a\mathsf{vel}e^{-(\mathsf{vel}^2+a^2)} \end{bmatrix} \)
  • \(\nabla_a f(s,a) = \begin{bmatrix} \frac{\partial f_1 (s,a)}{\partial a} & \frac{\partial f_2 (s,a)}{\partial a} \end{bmatrix} \)
    • \(=\begin{bmatrix} 0 & (1-2a^2) e^{-(\mathsf{vel}^2+a^2)} \end{bmatrix}\)

\(a_t\)

image/svg+xml f(x) x μ

$$\nabla_s f(s,a) = \begin{bmatrix} \frac{\partial f_1 (s,a)}{\partial \mathsf{pos}} & \frac{\partial f_2 (s,a)}{\partial \mathsf{pos}} \\  \frac{\partial f_1 (s,a)}{\partial \mathsf{vel}} & \frac{\partial f_2 (s,a)}{\partial \mathsf{vel}} \end{bmatrix} $$

\(=\begin{bmatrix} f_1(s,a)\\f_2(s,a)\end{bmatrix}\)

Quadratic Approximation

  • Second-order approximation $$ g(x) \approx g(x_0) + g'(x_0)(x-x_0) + \frac{1}{2} g''(x_0)(x-x_0)$$
  • For cost function \(c:\mathbb R^{n_s}\times \mathbb R^{n_a} \to \mathbb R\) $$ c(s,a) \approx c(s_0, a_0) + \nabla_s c(s_0, a_0)^\top (s-s_0) + \nabla_a c(s_0, a_0)^\top (a-a_0) + \\ \frac{1}{2} (s-s_0) ^\top \nabla^2_s c(s_0, a_0)(s-s_0)  + \frac{1}{2} (a-a_0) ^\top \nabla^2_a c(s_0, a_0)(a-a_0) \\+ (a-a_0) ^\top \nabla_{as}^2 c(s_0, a_0)(s-s_0) $$
  • Gradients \( \nabla_s c(s, a) \in\mathbb R^{n_s}\) and \( \nabla_a c(s, a) \in\mathbb R^{n_a}\)
  • Hessians \( \nabla_s^2 c(s, a) \in\mathbb R^{n_s\times n_s}\), \( \nabla_a^2 c(s, a) \in\mathbb R^{n_a \times n_a}\), and \( \nabla_{as}^2 c(s, a) \in\mathbb R^{n_a\times n_s}\) contain second derivatives
Parabola

Quadratic Approximation

  • For cost function \(c:\mathbb R^{n_s}\times \mathbb R^{n_a} \to \mathbb R\)
    • Gradients \( \nabla_s c(s, a) \in\mathbb R^{n_s}\) and \( \nabla_a c(s, a) \in\mathbb R^{n_a}\)
      • entry \(i\) represents effect of \(i\)th dimension of current state/action

\( \frac{\partial c (s,a)}{\partial s_i}\)

\( \frac{\partial c (s,a)}{\partial a_i}\)

\(i\)

\(i\)

Quadratic Approximation

  • For multi-variate function \(c:\mathbb R^{n_s}\times \mathbb R^{n_a} \to \mathbb R\)
    • Gradients \( \nabla_s c(s, a) \in\mathbb R^{n_s}\) and \( \nabla_a c(s, a) \in\mathbb R^{n_a}\)
    • Hessians \( \nabla_s^2 c(s, a) \in\mathbb R^{n_s\times n_s}\), \( \nabla_a^2 c(s, a) \in\mathbb R^{n_a \times n_a}\), \( \nabla_{as}^2 c(s, a) \in\mathbb R^{n_a\times n_s}\)

\( \frac{\partial^2 c (s,a)}{\partial s_i\partial s_j}\)

\( \frac{\partial^2c(s,a)}{\partial a_i\partial a_j}\)

\( \frac{\partial^2 c (s,a)}{\partial a_i \partial s_j}\)

\(i\)

\(i\)

\(i\)

\(j\)

\(j\)

\(j\)

symmetric

Example

image/svg+xml
  • Setting: hovering UAV over a target
    • state \(s = [\mathsf{pos}, \mathsf{vel}]\)
  • The cost: $$c(s,a) = (1-e^{-\mathsf{pos}^2}) +\lambda a^2$$
  • \(\nabla_s c(s,a)= \begin{bmatrix} 2\mathsf{pos}\cdot e^{-\mathsf{pos}^2} \\ 0 \end{bmatrix} \)
  • \(\nabla_s^2 c(s,a)= \begin{bmatrix} 2(1-2\mathsf{pos}^2) e^{-\mathsf{pos}^2} & 0\\ 0& 0 \end{bmatrix} \)
  • \(\nabla_a c(s,a)= 2\lambda a\) and \(\nabla_a^2 c(s,a)= 2\lambda\)
  • \(\nabla_{as}^2 c(s,a)=0\)

\(a_t\)

image/svg+xml f(x) x μ

Agenda

1. Recap & Example

2. Calculus Review

3. Multivariate Approximations

4. Preview: Nonlinear Control

  • Procedure
    1. Approximate dynamics & costs
      • First/second order approximation
      • Finite differencing
    2. Policy via LQR

minimize   \(\displaystyle\sum_{t=0}^{H-1} c(s_t, a_t)\)

s.t.   \(s_{t+1}=f(s_t, a_t), ~~a_t=\pi_t(s_t)\)

\(\pi\)

Preview: Nonlinear Control

Example

image/svg+xml
  • Setting: hovering UAV over a target
    • state \(s = [\mathsf{pos}, \mathsf{vel}]\)
  • Linearizing around \((0,0)\)
  • \(f(0,0) = 0\)
  • \(\nabla_s f(0,0) = \begin{bmatrix} 1 & 0 \\  1 & 1-2\cdot 0\cdot e^{-0} \end{bmatrix} \)
  • \(\nabla_a f(0,0) =\begin{bmatrix} 0 & (1-0) e^{-0} \end{bmatrix}\)
  • \(s_{t+1}=f(s_t, a_t) \approx \begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix}s_t + \begin{bmatrix}0\\ 1\end{bmatrix}a_t\)

\(a_t\)

image/svg+xml f(x) x μ

Example

image/svg+xml
  • Setting: hovering UAV over a target
    • state \(s = [\mathsf{pos}, \mathsf{vel}]\)
  • Linearizing around \((0,0)\)
    • \(\nabla_s c(0,0)= \begin{bmatrix} 0 \\ 0 \end{bmatrix} \)
      • \(\nabla_s^2 c(0,0)= \begin{bmatrix} 2 & 0\\ 0& 0 \end{bmatrix} \)
    • \(\nabla_a c(0,0)= 0\) and \(\nabla_a^2 c(0,0)= 2\lambda\)
    • \(\nabla_{as}^2 c(0,0)=0\)
  • \(c(s,a)\approx \mathsf{pos}^2 + \lambda a^2\)

\(a_t\)

image/svg+xml f(x) x μ

Example

  • Setting: hovering UAV over a target
  • Action: imperfect thrust right/left
  • LQR\(\left(\begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix},\begin{bmatrix}0\\ 1\end{bmatrix},\begin{bmatrix}1&0\\ 0&0\end{bmatrix},\frac{1}{2}\right)\)

\(\pi_t^\star(s) = \begin{bmatrix}{ \gamma^\mathsf{pos}_t }& {\gamma_t^\mathsf{vel}} \end{bmatrix}s = \gamma^\mathsf{pos}_t \mathsf{pos} + \gamma^\mathsf{vel}_t \mathsf{vel} \)

\(\gamma^\mathsf{pos}\)

\(\gamma^\mathsf{vel}\)

\(-1\)

\(t\)

\(H\)

image/svg+xml

\(a_t\)

Example

  • Setting: hovering UAV over a target
  • Action: imperfect thrust right/left
  • Local control \(\pi_t^\star(s) = \begin{bmatrix}{ \gamma^\mathsf{pos}_t }& {\gamma_t^\mathsf{vel}} \end{bmatrix}s \)
image/svg+xml

\(a_t\)

Recap

  • PSet due Wednesday

 

  • Calculus Review
  • Multivariate Approximations

 

  • Next lecture: Nonlinear Control

Sp24 CS 4/5789: Lecture 9

By Sarah Dean

Private

Sp24 CS 4/5789: Lecture 9