CS 4/5789: Introduction to Reinforcement Learning

Lecture 9: Local Approximations for Control

Prof. Sarah Dean

MW 2:55-4:10pm
255 Olin Hall

Reminders

Homework
- PSet 3 due Wednesday
- PA 2 released tonight, due 3/6
First exam is Monday 3/4 during lecture
- Post on Ed if conflicts (makeup is 3/1)
- TA led review session in class on Wed 2/28
February break Mon/Tues 2/26-7: no lecture, office hours

Agenda

1. Recap & Example

2. Calculus Review

3. Multivariate Approximations

4. Preview: Nonlinear Control

Recap: Optimal Control

Continuous $\mathcal S = \mathbb R^{n_s}$ and $\mathcal A = \mathbb R^{n_a}$
Cost to be minimized $c=(c_0,\dots, c_{H-1}, c_H)$
Deterministic transitions described by dynamics function $$s_{t+1} = f(s_t, a_t)$$
Finite horizon $H$

$\mathcal M = \{\mathcal{S}, \mathcal{A}, c, f, H\}$

minimize $\displaystyle\sum_{t=0}^{H-1} c_t(s_t, a_t)+c_H(s_H)$

s.t. $s_{t+1}=f(s_t, a_t), ~~a_t=\pi_t(s_t)$

$\pi$

DP Algorithm: $V^\star_{H}(s)=c_H(s)$$$V_{t}^\star(s) =\min_a c(s,a)+V^\star_{t+1}(f(s,a))$$

Example

Setting: hovering UAV over a target
Action: thrust right/left
- imperfect: attenuated at high thrusts and velocities
The dynamics:
- $\mathsf{position}_{t+1} = \mathsf{position}_{t}+ \mathsf{velocity}_{t}$
- $\mathsf{velocity}_{t+1}=\mathsf{velocity}_{t} + e^{- (\mathsf{velocity}_t^2+a_t^2)} a_t$
When velocity/thrust is:
- small, then $\mathsf{velocity}_{t+1}\approx \mathsf{velocity}_{t} +a_t $
- large, then $\mathsf{velocity}_{t+1}\approx \mathsf{velocity}_{t} $

$a_t$

Example

Setting: hovering UAV over a target
Action: thrust right/left
- imperfect: attenuated at high thrusts and velocities
Goal: stay near target position $0$
- Field of view is limited
- Thus cost is $$c(s,a) =(1-e^{-\mathsf{pos}^2}) +\lambda a^2$$

$a_t$

Difficulty of Nonlinear DP

Recall DP algorithm: $V^\star_{H}(s)=c_H(s)$ and then $$V_{t}^\star(s) =\min_a c(s,a)+V^\star_{t+1}(f(s,a))$$
- policy determined by argmin
For UAV example, quickly get expressions like $$(1-e^{-(\mathsf{pos}+2 \mathsf{vel} + e^{- (\mathsf{vel}^2+a^2)} a)^2}) $$

Agenda

1. Recap & Example

2. Calculus Review

3. Multivariate Approximations

4. Preview: Nonlinear Control

Recall the definition of a derivative for a scalar function $g$ $$g'(x) =\lim_{\delta\to 0} \frac{g(x+\delta)-g(x-\delta)}{2\delta}$$

Approximating Derivatives

Rather than compute exact derivatives, we will use a general purpose computational approximation
The finite difference approximation of a derivative for some small $\delta\neq 0$ is $$g'(x) \approx \frac{g(x+\delta)-g(x-\delta)}{2\delta}$$

Approximations Using Derivatives

Can we approximate complicated functions with something simpler (e.g. linear or quadratic)?
For a differentiable function $g:\mathbb R\to\mathbb R$
- Recall Taylor Expansion $$ g(x) = g(x_0) +g'(x_0)(x-x_0)+\frac{1}{2}g''(x_0)(x-x_0)^2 + ... $$
- When $x$ is close to $x_0$, the higher order terms become vanishingly small: $\epsilon^p\to 0$ as $p\to\infty$ for small $ |\epsilon|<1$

Approximations Using Derivatives

Can we approximate complication functions with something simpler (e.g. linear or quadratic)?
For a differentiable function $g:\mathbb R\to\mathbb R$
A first-order approximation around $x_0$ is $$g(x) \approx g(x_0) +g'(x_0)(x-x_0)$$
A second-order approximation around $x_0$ is $$g(x) \approx g(x_0) +g'(x_0)(x-x_0)+\frac{1}{2}g''(x_0)(x-x_0)^2$$

For a function of $n$ variables $g:\mathbb R^n\to \mathbb R$, the partial derivative: $$\frac{\partial g (x)}{\partial x_i} = \lim_{\delta\to0} \frac{g(x_1,\dots,x_i+\delta,\dots,x_n)-g(x_1,\dots,x_i-\delta,\dots,x_n)}{2\delta} $$
- equivalently in vector notation $$ = \lim_{\delta\to0} \frac{g(x+\delta e_i)-g(x-\delta e_i)}{2\delta} $$
The gradient $\nabla g(x)\in\mathbb R^n$ is a vector containing all partials
Second derivatives for all pairs $i$ and $j$ contained in hessian $$\nabla^2 g(x)\in\mathbb R^{n\times n},\quad \nabla^2 g(x)_{ij} =\frac{\partial^2 g (x)}{\partial x_i \partial x_j}$$

Multi-variate Functions

PollEv

Vector-valued Functions

For a function of $n$ variables and $m$ dimensions $g:\mathbb R^n\to \mathbb R^m$, there are $n\times m$ partial derivatives (each input and output)
The Jacobian $\nabla g(x)\in\mathbb R^{n\times m}$ generalizes the gradient and contains $\frac{\partial g_j(x)}{\partial x_i}$ in row $i$ and column $j$

$ \frac{\partial g_j (x)}{\partial x_i}$

$i$

$j$

Finite Difference Approximation

For scalar function $$g'(x) \approx \frac{g(x+\delta)-g(x-\delta)}{2\delta}$$
For multivariate $$ \frac{\partial g (x)}{\partial x_i} \approx \frac{g(x+\delta e_i)-g(x-\delta e_i)}{2\delta}$$ where $e_i$ is a standard basis vector
For second derivatives, repeat

$$\frac{\partial g (x)}{\partial x_i \partial x_j} \approx \frac{1}{2\delta}\Big[ \frac{g(x+\delta e_j+\delta e_i)- g(x-\delta e_j+\delta e_i)}{2\delta} \\- \frac{g(x+\delta e_j -\delta e_i)-g(x-\delta e_j -\delta e_i)}{2\delta} \Big]$$

$$\frac{\partial g (x)}{\partial x_i \partial x_j} \approx \frac{1}{2\delta}\Big[ \frac{\partial g (x+\delta e_i)}{\partial x_j} - \frac{\partial g (x -\delta e_i)}{\partial x_j} \Big]$$

Agenda

1. Recap & Example

2. Calculus Review

3. Multivariate Approximations

4. Preview: Nonlinear Control

Linear Approximation

Linear, also called first-order, approximation $$ g(x) \approx g(x_0) + g'(x_0)(x-x_0) $$
For the dynamics function $f:\mathbb R^{n_s}\times \mathbb R^{n_a} \to \mathbb R^{n_s}$ $$ f(s,a) \approx f(s_0, a_0) + \nabla_s f(s_0, a_0)^\top (s-s_0) + \nabla_a f(s_0, a_0)^\top (a-a_0) $$
Jacobians $ \nabla_s f(s, a) \in\mathbb R^{n_s\times n_s}$ and $ \nabla_a f(s, a) \in\mathbb R^{n_a\times n_s}$ contain:

row $i$ represents effects of $i$th dimension of current state/action, col $j$ represents effects on $f_j$, i.e. $j$th dimension of next state

$ \frac{\partial f_j (s,a)}{\partial s_i}$

$i$

$j$

$ \frac{\partial f_j (s,a)}{\partial a_i}$

$i$

$j$

Example

Setting: hovering UAV over a target
- state $s = [\mathsf{pos}, \mathsf{vel}]$
The dynamics: $$ f(s_t, a_t) = \begin{bmatrix} \mathsf{pos}_{t}+ \mathsf{vel}_{t}\\ \mathsf{vel}_{t} + e^{- (\mathsf{vel}_t^2+a_t^2)} a_t \end{bmatrix}\qquad $$
$= \begin{bmatrix} 1 & 0 \\ 1 & 1-2a\mathsf{vel}e^{-(\mathsf{vel}^2+a^2)} \end{bmatrix} $
$\nabla_a f(s,a) = \begin{bmatrix} \frac{\partial f_1 (s,a)}{\partial a} & \frac{\partial f_2 (s,a)}{\partial a} \end{bmatrix} $
- $=\begin{bmatrix} 0 & (1-2a^2) e^{-(\mathsf{vel}^2+a^2)} \end{bmatrix}$

$a_t$

$$\nabla_s f(s,a) = \begin{bmatrix} \frac{\partial f_1 (s,a)}{\partial \mathsf{pos}} & \frac{\partial f_2 (s,a)}{\partial \mathsf{pos}} \\ \frac{\partial f_1 (s,a)}{\partial \mathsf{vel}} & \frac{\partial f_2 (s,a)}{\partial \mathsf{vel}} \end{bmatrix} $$

$=\begin{bmatrix} f_1(s,a)\\f_2(s,a)\end{bmatrix}$

Quadratic Approximation

Second-order approximation $$ g(x) \approx g(x_0) + g'(x_0)(x-x_0) + \frac{1}{2} g''(x_0)(x-x_0)$$
For cost function $c:\mathbb R^{n_s}\times \mathbb R^{n_a} \to \mathbb R$ $$ c(s,a) \approx c(s_0, a_0) + \nabla_s c(s_0, a_0)^\top (s-s_0) + \nabla_a c(s_0, a_0)^\top (a-a_0) + \\ \frac{1}{2} (s-s_0) ^\top \nabla^2_s c(s_0, a_0)(s-s_0) + \frac{1}{2} (a-a_0) ^\top \nabla^2_a c(s_0, a_0)(a-a_0) \\+ (a-a_0) ^\top \nabla_{as}^2 c(s_0, a_0)(s-s_0) $$
Gradients $ \nabla_s c(s, a) \in\mathbb R^{n_s}$ and $ \nabla_a c(s, a) \in\mathbb R^{n_a}$
Hessians $ \nabla_s^2 c(s, a) \in\mathbb R^{n_s\times n_s}$, $ \nabla_a^2 c(s, a) \in\mathbb R^{n_a \times n_a}$, and $ \nabla_{as}^2 c(s, a) \in\mathbb R^{n_a\times n_s}$ contain second derivatives

Quadratic Approximation

For cost function $c:\mathbb R^{n_s}\times \mathbb R^{n_a} \to \mathbb R$
- Gradients $ \nabla_s c(s, a) \in\mathbb R^{n_s}$ and $ \nabla_a c(s, a) \in\mathbb R^{n_a}$
  - entry $i$ represents effect of $i$th dimension of current state/action

$ \frac{\partial c (s,a)}{\partial s_i}$

$ \frac{\partial c (s,a)}{\partial a_i}$

$i$

Quadratic Approximation

For multi-variate function $c:\mathbb R^{n_s}\times \mathbb R^{n_a} \to \mathbb R$
- Gradients $ \nabla_s c(s, a) \in\mathbb R^{n_s}$ and $ \nabla_a c(s, a) \in\mathbb R^{n_a}$
- Hessians $ \nabla_s^2 c(s, a) \in\mathbb R^{n_s\times n_s}$, $ \nabla_a^2 c(s, a) \in\mathbb R^{n_a \times n_a}$, $ \nabla_{as}^2 c(s, a) \in\mathbb R^{n_a\times n_s}$

$ \frac{\partial^2 c (s,a)}{\partial s_i\partial s_j}$

$ \frac{\partial^2c(s,a)}{\partial a_i\partial a_j}$

$ \frac{\partial^2 c (s,a)}{\partial a_i \partial s_j}$

$i$

$j$

symmetric

Example

Setting: hovering UAV over a target
- state $s = [\mathsf{pos}, \mathsf{vel}]$
The cost: $$c(s,a) = (1-e^{-\mathsf{pos}^2}) +\lambda a^2$$
$\nabla_s c(s,a)= \begin{bmatrix} 2\mathsf{pos}\cdot e^{-\mathsf{pos}^2} \\ 0 \end{bmatrix} $
$\nabla_s^2 c(s,a)= \begin{bmatrix} 2(1-2\mathsf{pos}^2) e^{-\mathsf{pos}^2} & 0\\ 0& 0 \end{bmatrix} $
$\nabla_a c(s,a)= 2\lambda a$ and $\nabla_a^2 c(s,a)= 2\lambda$
$\nabla_{as}^2 c(s,a)=0$

$a_t$

Agenda

1. Recap & Example

2. Calculus Review

3. Multivariate Approximations

4. Preview: Nonlinear Control

Procedure
1. Approximate dynamics & costs
  - First/second order approximation
  - Finite differencing
2. Policy via LQR

minimize $\displaystyle\sum_{t=0}^{H-1} c(s_t, a_t)$

s.t. $s_{t+1}=f(s_t, a_t), ~~a_t=\pi_t(s_t)$

$\pi$

Preview: Nonlinear Control

Example

Setting: hovering UAV over a target
- state $s = [\mathsf{pos}, \mathsf{vel}]$
Linearizing around $(0,0)$
$f(0,0) = 0$
$\nabla_s f(0,0) = \begin{bmatrix} 1 & 0 \\ 1 & 1-2\cdot 0\cdot e^{-0} \end{bmatrix} $
$\nabla_a f(0,0) =\begin{bmatrix} 0 & (1-0) e^{-0} \end{bmatrix}$
$s_{t+1}=f(s_t, a_t) \approx \begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix}s_t + \begin{bmatrix}0\\ 1\end{bmatrix}a_t$

$a_t$

Example

Setting: hovering UAV over a target
- state $s = [\mathsf{pos}, \mathsf{vel}]$
Linearizing around $(0,0)$
- $\nabla_s c(0,0)= \begin{bmatrix} 0 \\ 0 \end{bmatrix} $
  - $\nabla_s^2 c(0,0)= \begin{bmatrix} 2 & 0\\ 0& 0 \end{bmatrix} $
- $\nabla_a c(0,0)= 0$ and $\nabla_a^2 c(0,0)= 2\lambda$
- $\nabla_{as}^2 c(0,0)=0$
$c(s,a)\approx \mathsf{pos}^2 + \lambda a^2$

$a_t$

Example

Setting: hovering UAV over a target
Action: imperfect thrust right/left
LQR$\left(\begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix},\begin{bmatrix}0\\ 1\end{bmatrix},\begin{bmatrix}1&0\\ 0&0\end{bmatrix},\frac{1}{2}\right)$

$\pi_t^\star(s) = \begin{bmatrix}{ \gamma^\mathsf{pos}_t }& {\gamma_t^\mathsf{vel}} \end{bmatrix}s = \gamma^\mathsf{pos}_t \mathsf{pos} + \gamma^\mathsf{vel}_t \mathsf{vel} $

$\gamma^\mathsf{pos}$

$\gamma^\mathsf{vel}$

$-1$

$t$

$H$

simulations

$a_t$

Example

Setting: hovering UAV over a target
Action: imperfect thrust right/left
Local control $\pi_t^\star(s) = \begin{bmatrix}{ \gamma^\mathsf{pos}_t }& {\gamma_t^\mathsf{vel}} \end{bmatrix}s $

simulations

$a_t$

Recap

PSet due Wednesday

Calculus Review
Multivariate Approximations

Next lecture: Nonlinear Control

Sp24 CS 4/5789: Lecture 9

By Sarah Dean

Sp24 CS 4/5789: Lecture 9

Sarah Dean PRO

asst prof in CS at Cornell

sdean.website

CS 4/5789: Introduction to Reinforcement Learning

Lecture 9: Local Approximations for Control

Reminders

Agenda

Recap: Optimal Control

Example

Example

Difficulty of Nonlinear DP

Agenda

Approximating Derivatives

Approximations Using Derivatives

Approximations Using Derivatives

Multi-variate Functions

Vector-valued Functions

Finite Difference Approximation

Agenda

Linear Approximation

Example

Quadratic Approximation

Quadratic Approximation

Quadratic Approximation

Example

Agenda

Preview: Nonlinear Control

Example

Example

Example

Example

Recap

Sp24 CS 4/5789: Lecture 9

More from Sarah Dean