Prof. Sarah Dean

MW 2:45-4pm
255 Olin Hall

## Reminders

• Homework this week
• PSet due Wednesday
• PA due in 3/1
• My office hours:
• Tuesdays 10:30-11:30am in Gates 416A
• cancelled 2/28 (February break)
• Wednesdays 4-4:50pm in Olin 255 (right after lecture)

## Agenda

1. Recap: Control & LQR

2. Optimal LQR Policy

3. Nonlinear Approximation

4. Local Linear Control

## Recap: Optimal Control

• Continuous $$\mathcal S = \mathbb R^{n_s}$$ and $$\mathcal A = \mathbb R^{n_a}$$
• Cost to be minimized $$c=(c_0,\dots, c_{H-1}, c_H)$$
• Deterministic transitions described by dynamics function $$s_{t+1} = f(s_t, a_t)$$
• Finite horizon $$H$$

$$\mathcal M = \{\mathcal{S}, \mathcal{A}, c, f, H\}$$

minimize   $$\displaystyle\sum_{t=0}^{H-1} c_t(s_t, a_t)+c_H(s_H)$$

s.t.   $$s_{t+1}=f(s_t, a_t), ~~a_t=\pi_t(s_t)$$

$$\pi$$

## Recap: DP for OC

General purpose dynamic programming algorithm in the context of optimal control is:

• Initialize $$V^\star_H(s) = c_H(s)$$
• For $$t=H-1, H-2, ..., 0$$:
• $$Q_t^\star(s,a) = c(s,a)+V^\star_{t+1}(f(s,a))$$
• $$\pi_t^\star(s) = \arg\min_a Q_t^\star(s,a)$$
• $$V^\star_{t}(s)=Q_t^\star(s,\pi_t^\star(s) )$$
• Return $$\pi^\star = (\pi^\star_0,\dots ,\pi^\star_{H-1})$$

## Recap: LQR

Special case of optimal control problem with

• Quadratic cost $$c_t(s,a) = s^\top Qs+ a^\top Ra,\quad c_H = s^\top Qs$$ where $$Q$$ is symmetric and positive semi-definite and $$R$$ is symmetric and positive definite
• Linear dynamics $$s_{t+1} = As_t+ Ba_t$$

minimize   $$\displaystyle\sum_{t=0}^{H-1} s_t^\top Qs_t +a_t^\top Ra_t+s_H^\top Q s_H$$

s.t.   $$s_{t+1}=As_t+B a_t, ~~a_t=\pi_t(s_t)$$

$$\pi$$

Important background:

1. A matrix is symmetric if $$M=M^\top$$
2. A matrix is positive semi-definite (PSD) if all its eigenvalues are greater than or equal to 0
3. A matrix is positive definite if all its eigenvalues are strictly greater than 0
4. All positive definite matrices are invertible

## Resources

Linear algebra and probability background*

*these references are not necessarily an exact match to the course and they are not required

## Recall: Example

$$\min_{a_0, a_1}\quad s_0^\top \begin{bmatrix}1&0\\ 0&0\end{bmatrix}s_0 + s_1^\top \begin{bmatrix}1&0\\ 0&0\end{bmatrix}s_1 + s_2^\top \begin{bmatrix}1&0\\ 0&0\end{bmatrix}s_2+\lambda a_{0}^2+\lambda a_1^2$$

$$\text{s.t.} \quad s_{1} = \begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix}s_{0} + \begin{bmatrix}0\\ 1\end{bmatrix}a_{0} , \quad \quad s_{2} = \begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix}s_{1} + \begin{bmatrix}0\\ 1\end{bmatrix}a_{1}$$

• Setting: hovering UAV over a target
• Action: thrust right/left
• State: distance from target, velocity
• LQR$$\left(\begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix},\begin{bmatrix}0\\ 1\end{bmatrix},\begin{bmatrix}1&0\\ 0&0\end{bmatrix},\lambda,H=2\right)$$

$$a_t$$

$$a_1^\star=0$$

$$\min_{a_1}\quad (\begin{bmatrix}1&0\end{bmatrix}s_1)^2 + (\begin{bmatrix}1&1\end{bmatrix}s_1)^2 + \lambda a_1^2$$

## Recall: Example

$$\min_{a_0}\quad s_0^\top \begin{bmatrix}1&0\\ 0&0\end{bmatrix}s_0 + (\begin{bmatrix}1&0\end{bmatrix}s_1)^2 + (\begin{bmatrix}1&1\end{bmatrix}s_1)^2 +\lambda a_{0}^2$$

$$\text{s.t.} \quad s_{1} = \begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix}s_{0} + \begin{bmatrix}0\\ 1\end{bmatrix}a_{0} , \quad \qquad\qquad\qquad\qquad$$

$$a_t$$

$$a_1^\star=0$$

$$=-\frac{1}{1+\lambda}(\mathsf{pos}_0-x+2\mathsf{vel}_0)$$

• Setting: hovering UAV over a target
• Action: thrust right/left
• State: distance from target, velocity
• LQR$$\left(\begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix},\begin{bmatrix}0\\ 1\end{bmatrix},\begin{bmatrix}1&0\\ 0&0\end{bmatrix},\lambda,H=2\right)$$

$$a_0^\star = -\frac{\begin{bmatrix}1&2\end{bmatrix}s_0}{1+\lambda}$$

$$\min_{a_0}\quad s_0^\top \begin{bmatrix}3&3\\ 3&5\end{bmatrix}s_0 + 2 s_0^\top \begin{bmatrix}1\\2\end{bmatrix}a_0 + (1 +\lambda)a_0^2$$

## Agenda

1. Recap: Control & LQR

2. Optimal LQR Policy

3. Nonlinear Approximation

4. Local Linear Control

## LQR via DP

• $$V_H^\star(s) = s^\top Q s$$
• $$t=H-1$$: $$\quad \min_{a} s^\top Q s+a^\top Ra+ (As+Ba)^\top Q (As+Ba)$$
• $$\quad \min_{a} s^\top (Q+A^\top QA) s+a^\top (R+B^\top QB) a+2s^\top A^\top Q Ba$$
• General minimization: $$\arg\min_a c + a^\top M a + 2m^\top a$$
• $$2Ma_\star + 2m = 0 \implies a_\star = -M^{-1} m$$
• $$\pi_{H-1}^\star(s)=-(R+B^\top QB)^{-1}B^\top QAs$$
• minimum is $$c-m^\top M^{-1} m$$
• $$V_{H-1}^\star(s) = s^\top (Q+A^\top QA - A^\top QB(R+B^\top QB)^{-1}B^\top QA) s$$

DP: $$V_t^\star (s) = \min_{a} c(s, a)+V_{t+1}^\star (f(s,a))$$

PollEV

Important background:

1. The gradient of a function $$f:\mathbb R^d \to\mathbb R$$  is the vector $$\nabla f(x) = \begin{bmatrix}\frac{\partial f}{\partial x_1} \\ \vdots \\ \frac{\partial f}{\partial x_n}\end{bmatrix}$$
2. If $$f$$ has a minimum at $$x_\star$$ then $$\nabla f(x_\star) = 0$$
3. The gradient of quadratic and linear functions are $$\nabla \left[x^\top Mx\right]=Mx+M^\top x,\quad \nabla \left[m^\top x\right] = m$$

## LQR via DP

• $$V_H^\star(s) = s^\top Q s$$
• $$t=H-1$$: $$\quad \min_{a} s^\top Q s+a^\top Ra+ (As+Ba)^\top Q (As+Ba)$$
• $$\pi_{H-1}^\star(s)=-(R+B^\top QB)^{-1}B^\top QAs$$
• $$V_{H-1}^\star(s) = s^\top (Q+A^\top QA - A^\top QB(R+B^\top QB)^{-1}B^\top QA) s$$

Theorem:  For $$t=0,\dots ,H-1$$, the optimal value function is quadratic and the optimal policy is linear$$V^\star_t (s) = s^\top P_t s \quad\text{ and }\quad \pi_t^\star(s) = K_t s$$

where the matrices are defined as $$P_{H} = Q$$ and

• $$P_t = Q+A^\top P_{t+1}A - A^\top P_{t+1}B(R+B^\top P_{t+1}B)^{-1}B^\top P_{t+1}A$$
• $$K_t = -(R+B^\top P_{t+1}B)^{-1}B^\top P_{t+1}A$$

## LQR Proof

• Base case: $$V_H^\star(s) = s^\top Q s$$
• Inductive step: Assume that $$V^\star_{t+1} (s) = s^\top P_{t+1} s$$.
• DP at $$t$$: $$V_t^\star(s)= \min_{a} s^\top Q s+a^\top Ra+ (As+Ba)^\top P_{t+1} (As+Ba)$$
• $$\quad \min_{a} s^\top (Q+A^\top P_{t+1}A) s+a^\top (R+B^\top P_{t+1} B) a+2s^\top A^\top P_{t+1} Ba$$
• General minimization: $$\arg\min_a c + a^\top M a + 2m^\top a$$ gives $$a_\star = -M^{-1} m$$ and minimum is $$c-m^\top M^{-1} m$$
• $$\pi_{t}^\star(s)=-(R+B^\top P_{t+1}B)^{-1}B^\top P_{t+1}As$$
• $$V_{t}^\star(s) = s^\top (Q+A^\top P_{t+1}A - A^\top P_{t+1}B(R+B^\top P_{t+1}B)^{-1}B^\top P_{t+1}A) s$$

Theorem:  $$V^\star_t (s) = s^\top P_t s$$ and $$\pi_t^\star(s) = K_t s$$ where $$P_{H} = Q$$,
$$P_t = Q+A^\top P_{t+1}A - A^\top P_{t+1}B(R+B^\top P_{t+1}B)^{-1}B^\top P_{t+1}A$$
$$K_t = -(R+B^\top P_{t+1}B)^{-1}B^\top P_{t+1}A$$

## Example

• Setting: hovering UAV over a target
• Action: thrust right/left
• State: distance from target, velocity
• LQR$$\left(\begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix},\begin{bmatrix}0\\ 1\end{bmatrix},\begin{bmatrix}1&0\\ 0&0\end{bmatrix},\frac{1}{2}\right)$$

$$a_t$$

$$\pi_t^\star(s) = \begin{bmatrix}{ \gamma^\mathsf{pos}_t }& {\gamma_t^\mathsf{vel}} \end{bmatrix}s = \gamma^\mathsf{pos}_t (\mathsf{pos} - x) + \gamma^\mathsf{vel}_t \mathsf{vel}$$

$$\gamma^\mathsf{pos}$$

$$\gamma^\mathsf{vel}$$

$$-1$$

$$t$$

$$H$$

## LQR Extensions

• The same dynamic programming method extends in a straightforward manner when:
1. Dynamics and costs are time varying
2. Affine term in the dynamics, cross terms in the costs
• General form:  $$f_t(s_t,a_t) = A_ts_t + B_t a_t +c_t$$ and $$c_t(s,a) = s^\top Q_ts+a^\top R_ta+a^\top M_ts + q_t^\top s + r_t^\top a+ v_t$$
• General solution: $$\pi^\star_t(s) = K_t s+ k_t$$ where $$\{K_t,k_t\}_{t=0}^{H-1} = \mathsf{LQR}(\{A_t,B_t,c_t, Q_t, R_t, M_t, q_t, r_t, v_t\}_{t=0}^{H-1})$$
• Many applications can be reformulated this way:
• e.g. trajectory tracking $$c_t(s,a) = \|s-\bar s_t\|_2^2 + \|a\|_2^2$$ for given $$\bar s_t$$
• Nonlinear dynamics and costs (Programming Assignment 2)

## Agenda

1. Recap: Control & LQR

2. Optimal LQR Policy

3. Nonlinear Approximation

4. Local Linear Control

## Example

• Setting: hovering UAV over a target
• Action: thrust right/left
• imperfect: attenuated at high thrusts and velocities
• The dynamics:
• $$\mathsf{position}_{t+1} = \mathsf{position}_{t}+ \mathsf{velocity}_{t}$$
• $$\mathsf{velocity}_{t+1}=\mathsf{velocity}_{t} + e^{- (\mathsf{velocity}_t^2+a_t^2)} a_t$$
• When velocity/thrust is:
• small, then $$\mathsf{velocity}_{t+1}\approx \mathsf{velocity}_{t} +a_t$$
• large, then $$\mathsf{velocity}_{t+1}\approx \mathsf{velocity}_{t}$$

$$a_t$$

## Example

• Setting: hovering UAV over a target
• Action: thrust right/left
• imperfect: attenuated at high thrusts and velocities
• Goal: stay near target position $$0$$
• Field of view is limited
• Thus cost is $$c(s,a) =(1-e^{-\mathsf{pos}^2}) +\lambda a^2$$

$$a_t$$

## Low-Order Approximation

• How to find simpler (e.g. linear or quadratic) approximations?
• For a nonlinear differentiable function $$g:\mathbb R\to\mathbb R$$
• Recall Taylor Expansion $$g(x) = g(x_0) +g'(x_0)(x-x_0)+\frac{1}{2}g''(x_0)(x-x_0)^2 + ...$$
• When $$x$$ is close to $$x_0$$, the higher order terms become vanishingly small: $$\epsilon^p\to 0$$ as $$p\to\infty$$ for $$|\epsilon|<1$$

## Linear Approximation

• Linear, also called first-order, approximation $$g(x) \approx g(x_0) + g'(x_0)(x-x_0)$$
• For vector-valued multi-variate function $$f:\mathbb R^{n_s}\times \mathbb R^{n_a} \to \mathbb R^{n_s}$$ $$f(s,a) \approx f(s_0, a_0) + \nabla_s f(s_0, a_0)^\top (s-s_0) + \nabla_a f(s_0, a_0)^\top (a-a_0)$$
• Jacobians $$\nabla_s f(s, a) \in\mathbb R^{n_s\times n_s}$$ and $$\nabla_a f(s, a) \in\mathbb R^{n_a\times n_s}$$ contain:
• row $$i$$ represents effects of $$i$$th dimension of current state/action, col $$j$$ represents effects on $$f_j$$, i.e. $$j$$th dimension of next state

$$\frac{\partial f_j (s,a)}{\partial s_i}$$

$$i$$

$$j$$

$$\frac{\partial f_j (s,a)}{\partial a_i}$$

$$i$$

$$j$$

## Example

• Setting: hovering UAV over a target
• state $$s = [\mathsf{pos}, \mathsf{vel}]$$
• The dynamics: $$f(s_t, a_t) = \begin{bmatrix} \mathsf{pos}_{t}+ \mathsf{vel}_{t}\\ \mathsf{vel}_{t} + e^{- (\mathsf{vel}_t^2+a_t^2)} a_t \end{bmatrix}\qquad$$
• $$= \begin{bmatrix} 1 & 0 \\ 1 & 1-2a\mathsf{vel}e^{-(\mathsf{vel}^2+a^2)} \end{bmatrix}$$
• $$\nabla_a f(s,a) = \begin{bmatrix} \frac{\partial f_1 (s,a)}{\partial a} & \frac{\partial f_2 (s,a)}{\partial a} \end{bmatrix}$$
• $$=\begin{bmatrix} 0 & (1-2a^2) e^{-(\mathsf{vel}^2+a^2)} \end{bmatrix}$$

$$a_t$$

$$\nabla_s f(s,a) = \begin{bmatrix} \frac{\partial f_1 (s,a)}{\partial \mathsf{pos}} & \frac{\partial f_2 (s,a)}{\partial \mathsf{pos}} \\ \frac{\partial f_1 (s,a)}{\partial \mathsf{vel}} & \frac{\partial f_2 (s,a)}{\partial \mathsf{vel}} \end{bmatrix}$$

$$=\begin{bmatrix} f_1(s,a)\\f_2(s,a)\end{bmatrix}$$

• Second-order approximation $$g(x) \approx g(x_0) + g'(x_0)(x-x_0) + \frac{1}{2} g''(x_0)(x-x_0)$$
• For multi-variate function $$c:\mathbb R^{n_s}\times \mathbb R^{n_a} \to \mathbb R$$ $$c(s,a) \approx c(s_0, a_0) + \nabla_s c(s_0, a_0)^\top (s-s_0) + \nabla_a c(s_0, a_0)^\top (a-a_0) + \\ \frac{1}{2} (s-s_0) ^\top \nabla^2_s c(s_0, a_0)(s-s_0) + \frac{1}{2} (a-a_0) ^\top \nabla^2_a c(s_0, a_0)(a-a_0) \\+ (a-a_0) ^\top \nabla_{as}^2 c(s_0, a_0)(s-s_0)$$
• Gradients $$\nabla_s c(s, a) \in\mathbb R^{n_s}$$ and $$\nabla_a c(s, a) \in\mathbb R^{n_a}$$
• Hessians $$\nabla_s^2 c(s, a) \in\mathbb R^{n_s\times n_s}$$, $$\nabla_a^2 c(s, a) \in\mathbb R^{n_a \times n_a}$$, and $$\nabla_{as}^2 c(s, a) \in\mathbb R^{n_a\times n_s}$$ contain second derivatives

• For multi-variate function $$c:\mathbb R^{n_s}\times \mathbb R^{n_a} \to \mathbb R$$
• Gradients $$\nabla_s c(s, a) \in\mathbb R^{n_s}$$ and $$\nabla_a c(s, a) \in\mathbb R^{n_a}$$
• entry $$i$$ represents effect of $$i$$th dimension of current state/action

$$\frac{\partial c (s,a)}{\partial s_i}$$

$$\frac{\partial c (s,a)}{\partial a_i}$$

$$i$$

$$i$$

• For multi-variate function $$c:\mathbb R^{n_s}\times \mathbb R^{n_a} \to \mathbb R$$
• Gradients $$\nabla_s c(s, a) \in\mathbb R^{n_s}$$ and $$\nabla_a c(s, a) \in\mathbb R^{n_a}$$
• Hessians $$\nabla_s^2 c(s, a) \in\mathbb R^{n_s\times n_s}$$, $$\nabla_a^2 c(s, a) \in\mathbb R^{n_a \times n_a}$$, $$\nabla_{as}^2 c(s, a) \in\mathbb R^{n_a\times n_s}$$

$$\frac{\partial^2 c (s,a)}{\partial s_i\partial s_j}$$

$$\frac{\partial^2c(s,a)}{\partial a_i\partial a_j}$$

$$\frac{\partial^2 c (s,a)}{\partial a_i \partial s_j}$$

$$i$$

$$i$$

$$i$$

$$j$$

$$j$$

$$j$$

symmetric

## Example

• Setting: hovering UAV over a target
• state $$s = [\mathsf{pos}, \mathsf{vel}]$$
• The cost: $$c(s,a) = (1-e^{-\mathsf{pos}^2}) +\lambda a^2$$
• $$\nabla_s c(s,a)= \begin{bmatrix} 2\mathsf{pos}\cdot e^{-\mathsf{pos}^2} \\ 0 \end{bmatrix}$$
• $$\nabla_s^2 c(s,a)= \begin{bmatrix} 2(1-2\mathsf{pos}^2) e^{-\mathsf{pos}^2} & 0\\ 0& 0 \end{bmatrix}$$
• $$\nabla_a c(s,a)= 2\lambda a$$ and $$\nabla_a^2 c(s,a)= 2\lambda$$
• $$\nabla_{as}^2 c(s,a)=0$$

$$a_t$$

## Finite Difference Approximation

• For scalar function $$g'(x) \approx \frac{g(x+\delta)-g(x-\delta)}{2\delta}$$
• For multivariate $$\frac{\partial f_j (s,a)}{\partial s_i} \approx \frac{f_j(s+\delta e_i,a)-f_j(s-\delta e_i,a)}{2\delta}$$ where $$e_i$$ is a standard basis vector
• For second derivatives, repeat

$$\frac{\partial c (s,a)}{\partial a_i \partial s_j} \approx \frac{1}{2\delta}\Big[ \frac{c(s+\delta e_j,a +\delta e_i)- c(s-\delta e_j,a +\delta e_i)}{2\delta} \\- \frac{c(s+\delta e_j,a -\delta e_i)-c(s-\delta e_j,a -\delta e_i)}{2\delta} \Big]$$

$$\frac{\partial c (s,a)}{\partial a_i \partial s_j} \approx \frac{1}{2\delta}\Big[ \frac{\partial c (s,a +\delta e_i)}{\partial s_j} - \frac{\partial c (s,a -\delta e_i)}{\partial s_j} \Big]$$

## Agenda

1. Recap: Control & LQR

2. Optimal LQR Policy

3. Nonlinear Approximation

4. Local Linear Control

## Local Control

• Local control around $$(s_\star,a_\star)$$
• e.g. Cartpole (PA2)
• $$s = \begin{bmatrix} \theta\\ \omega \\ x \\ f \end{bmatrix}$$ and $$a = f$$
• goal: balance $$s_\star = 0$$ and $$a_\star = 0$$

• Applicable when costs $$c$$ are smallest at $$(s_\star,a_\star)$$ and initial state is close to $$s_\star$$

angle $$\theta$$

angular velocity $$\omega$$

gravity

position $$x$$

force $$f$$

velocity $$v$$

• Assumptions:
1. Black-box access to $$f$$ and $$c$$
• i.e. can query at any $$(s,a)$$ and observe outputs $$s'$$ and $$c$$ where $$s'=f(s,a)$$ and $$c=c(s,a)$$
2. $$f$$ is differentiable and $$c$$ is twice differentiable
• i.e. Jacobians and Hessians are well defined

minimize   $$\displaystyle\sum_{t=0}^{H-1} c(s_t, a_t)$$

s.t.   $$s_{t+1}=f(s_t, a_t), ~~a_t=\pi_t(s_t)$$

$$\pi$$

## Local Control

• Procedure
1. Approximate dynamics & costs
• First/second order approximation
• Finite differencing
2. Policy via LQR

minimize   $$\displaystyle\sum_{t=0}^{H-1} c(s_t, a_t)$$

s.t.   $$s_{t+1}=f(s_t, a_t), ~~a_t=\pi_t(s_t)$$

$$\pi$$

## Linearized Dynamics

• Linearization of dynamics around $$(s_0,a_0)$$
• $$f(s,a) \approx f(s_0, a_0) + \nabla_s f(s_0, a_0)^\top (s-s_0) + \nabla_a f(s_0, a_0)^\top (a-a_0)$$
• $$=A_0s+B_0a+c_0$$
• where the matrices depend on $$(s_0,a_0)$$:
• $$A_0 = \nabla_s f(s_0, a_0)^\top$$
• $$B_0 = \nabla_a f(s_0, a_0)^\top$$
• $$c_0 = f(s_0, a_0) - \nabla_s f(s_0, a_0)^\top s_0 - \nabla_a f(s_0, a_0)^\top a_0$$
• Black box access: use finite differencing to compute

## Example

• Setting: hovering UAV over a target
• state $$s = [\mathsf{pos}, \mathsf{vel}]$$
• Linearizing around $$(0,0)$$
• $$f(0,0) = 0$$
• $$\nabla_s f(0,0) = \begin{bmatrix} 1 & 0 \\ 1 & 1-2\cdot 0\cdot e^{-0} \end{bmatrix}$$
• $$\nabla_a f(0,0) =\begin{bmatrix} 0 & (1-0) e^{-0} \end{bmatrix}$$
• $$s_{t+1}=f(s_t, a_t) \approx \begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix}s_t + \begin{bmatrix}0\\ 1\end{bmatrix}a_t$$

$$a_t$$

## Second-Order Approx. Costs

• Approximate costs around $$(s_0,a_0)$$ $$c(s,a) \approx c(s_0, a_0) + \nabla_s c(s_0, a_0)^\top (s-s_0) + \nabla_a c(s_0, a_0)^\top (a-a_0) + \\ \frac{1}{2} (s-s_0) ^\top \nabla^2_s c(s_0, a_0)(s-s_0) + \frac{1}{2} (a-a_0) ^\top \nabla^2_a c(s_0, a_0)(a-a_0) \\+ (a-a_0) ^\top \nabla_{as}^2 c(s_0, a_0)(s-s_0)$$
• $$=s^\top Q_0s+a^\top R_0a+a^\top M_0s + q_0^\top s + r_0^\top a+ v_0$$
• Practical consideration:
• Force $$Q_0,R_0$$ to be positive definite by setting negative eigenvalues to 0 and adding regularization $$\lambda I$$
• Black box access: use finite differencing to compute

For a symmetric matrix $$Q\in\mathbb R^{n\times n}$$ the eigen-decomposition is $$Q = \sum_{i=1}^n v_iv_i^\top \sigma_i$$

To make this PSD, we replace $$Q\leftarrow \sum_{i=1}^n v_iv_i^\top (\max\{0,\sigma_i\} +\lambda)$$

## Example

• Setting: hovering UAV over a target
• state $$s = [\mathsf{pos}, \mathsf{vel}]$$
• Linearizing around $$(0,0)$$
• $$\nabla_s c(0,0)= \begin{bmatrix} 0 \\ 0 \end{bmatrix}$$
• $$\nabla_s^2 c(0,0)= \begin{bmatrix} 2 & 0\\ 0& 0 \end{bmatrix}$$
• $$\nabla_a c(0,0)= 0$$ and $$\nabla_a^2 c(0,0)= 2\lambda$$
• $$\nabla_{as}^2 c(0,0)=0$$
• $$c(s,a)\approx \mathsf{pos}^2 + \lambda a^2$$

$$a_t$$

1. Approximate dynamics & costs
• Linearize $$f$$ as $$A_0,B_0,c_0$$
• Approx $$c$$ as $$Q_0,R_0,M_0,q_0,r_0,v_0$$
2. LQR policy: $$\pi^\star_t(s) = K_t s+ k_t$$ where $$\{K_t,k_t\}_{t=0}^{H-1} = \mathsf{LQR}(A_0,B_0,c_0, Q_0, R_0, M_0, q_0, r_0, v_0)$$
• works as long as states and actions remain close to $$s_\star$$ and $$a_\star$$

## Local Control

minimize   $$\displaystyle\sum_{t=0}^{H-1} c(s_t, a_t)$$

s.t.   $$s_{t+1}=f(s_t, a_t), ~~a_t=\pi_t(s_t)$$

$$\pi$$

## Local Control as Approx DP

• Initialize $$V^\star_H(s) = c_H(s)$$
• For $$t=H-1, H-2, ..., 0$$:
• $$Q_t^\star(s,a) = c(s,a)+V^\star_{t+1}(f(s,a))$$
• $$\pi_t^\star(s) = \arg\min_a Q_t^\star(s,a)$$
• $$V^\star_{t}(s)=Q_t^\star(s,\pi_t^\star(s) )$$
• Return $$\pi^\star = (\pi^\star_0,\dots ,\pi^\star_{H-1})$$

## Recap

• PSet due Wednesday

• Optimal LQR Policy
• Nonlinear Approximation
• Locally Linear Control

• Next lecture: Iterative Nonlinear Control

By Sarah Dean

Private