CS 4/5789: Introduction to Reinforcement Learning
Lecture 9: LQR & Nonlinear Control
Prof. Sarah Dean
MW 2:45-4pm
255 Olin Hall
Reminders
- Homework this week
- PSet due Wednesday
- PA due in 3/1
- My office hours:
- Tuesdays 10:30-11:30am in Gates 416A
- cancelled 2/28 (February break)
- Wednesdays 4-4:50pm in Olin 255 (right after lecture)
- Tuesdays 10:30-11:30am in Gates 416A
Agenda
1. Recap: Control & LQR
2. Optimal LQR Policy
3. Nonlinear Approximation
4. Local Linear Control
Recap: Optimal Control
- Continuous \(\mathcal S = \mathbb R^{n_s}\) and \(\mathcal A = \mathbb R^{n_a}\)
- Cost to be minimized \(c=(c_0,\dots, c_{H-1}, c_H)\)
- Deterministic transitions described by dynamics function $$s_{t+1} = f(s_t, a_t)$$
- Finite horizon \(H\)
\(\mathcal M = \{\mathcal{S}, \mathcal{A}, c, f, H\}\)
minimize \(\displaystyle\sum_{t=0}^{H-1} c_t(s_t, a_t)+c_H(s_H)\)
s.t. \(s_{t+1}=f(s_t, a_t), ~~a_t=\pi_t(s_t)\)
\(\pi\)
Recap: DP for OC
General purpose dynamic programming algorithm in the context of optimal control is:
- Initialize \(V^\star_H(s) = c_H(s)\)
- For \(t=H-1, H-2, ..., 0\):
- \(Q_t^\star(s,a) = c(s,a)+V^\star_{t+1}(f(s,a))\)
- \(\pi_t^\star(s) = \arg\min_a Q_t^\star(s,a)\)
- \(V^\star_{t}(s)=Q_t^\star(s,\pi_t^\star(s) )\)
- Return \(\pi^\star = (\pi^\star_0,\dots ,\pi^\star_{H-1})\)
Recap: LQR
Special case of optimal control problem with
- Quadratic cost $$c_t(s,a) = s^\top Qs+ a^\top Ra,\quad c_H = s^\top Qs$$ where \(Q\) is symmetric and positive semi-definite and \(R\) is symmetric and positive definite
- Linear dynamics $$s_{t+1} = As_t+ Ba_t$$
minimize \(\displaystyle\sum_{t=0}^{H-1} s_t^\top Qs_t +a_t^\top Ra_t+s_H^\top Q s_H\)
s.t. \(s_{t+1}=As_t+B a_t, ~~a_t=\pi_t(s_t)\)
\(\pi\)
Important background:
- A matrix is symmetric if \(M=M^\top\)
- A matrix is positive semi-definite (PSD) if all its eigenvalues are greater than or equal to 0
- A matrix is positive definite if all its eigenvalues are strictly greater than 0
- All positive definite matrices are invertible
Resources
Linear algebra and probability background*
- Interactive Linear Algebra
- especially Ch 6
- Linear Algebra Review and Reference
- Review of Probability Theory
*these references are not necessarily an exact match to the course and they are not required
Recall: Example
$$\min_{a_0, a_1}\quad s_0^\top \begin{bmatrix}1&0\\ 0&0\end{bmatrix}s_0 + s_1^\top \begin{bmatrix}1&0\\ 0&0\end{bmatrix}s_1 + s_2^\top \begin{bmatrix}1&0\\ 0&0\end{bmatrix}s_2+\lambda a_{0}^2+\lambda a_1^2 $$
$$\text{s.t.} \quad s_{1} = \begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix}s_{0} + \begin{bmatrix}0\\ 1\end{bmatrix}a_{0} , \quad \quad s_{2} = \begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix}s_{1} + \begin{bmatrix}0\\ 1\end{bmatrix}a_{1} $$
- Setting: hovering UAV over a target
- Action: thrust right/left
- State: distance from target, velocity
- LQR\(\left(\begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix},\begin{bmatrix}0\\ 1\end{bmatrix},\begin{bmatrix}1&0\\ 0&0\end{bmatrix},\lambda,H=2\right)\)
\(a_t\)
\(a_1^\star=0\)
$$\min_{a_1}\quad (\begin{bmatrix}1&0\end{bmatrix}s_1)^2 + (\begin{bmatrix}1&1\end{bmatrix}s_1)^2 + \lambda a_1^2$$
Recall: Example
$$\min_{a_0}\quad s_0^\top \begin{bmatrix}1&0\\ 0&0\end{bmatrix}s_0 + (\begin{bmatrix}1&0\end{bmatrix}s_1)^2 + (\begin{bmatrix}1&1\end{bmatrix}s_1)^2 +\lambda a_{0}^2 $$
$$\text{s.t.} \quad s_{1} = \begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix}s_{0} + \begin{bmatrix}0\\ 1\end{bmatrix}a_{0} , \quad \qquad\qquad\qquad\qquad$$
\(a_t\)
\(a_1^\star=0\)
\(=-\frac{1}{1+\lambda}(\mathsf{pos}_0-x+2\mathsf{vel}_0)\)
- Setting: hovering UAV over a target
- Action: thrust right/left
- State: distance from target, velocity
- LQR\(\left(\begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix},\begin{bmatrix}0\\ 1\end{bmatrix},\begin{bmatrix}1&0\\ 0&0\end{bmatrix},\lambda,H=2\right)\)
\( a_0^\star = -\frac{\begin{bmatrix}1&2\end{bmatrix}s_0}{1+\lambda} \)
$$\min_{a_0}\quad s_0^\top \begin{bmatrix}3&3\\ 3&5\end{bmatrix}s_0 + 2 s_0^\top \begin{bmatrix}1\\2\end{bmatrix}a_0 + (1 +\lambda)a_0^2 $$
Agenda
1. Recap: Control & LQR
2. Optimal LQR Policy
3. Nonlinear Approximation
4. Local Linear Control
LQR via DP
- \(V_H^\star(s) = s^\top Q s\)
- \(t=H-1\): \(\quad \min_{a} s^\top Q s+a^\top Ra+ (As+Ba)^\top Q (As+Ba)\)
- \(\quad \min_{a} s^\top (Q+A^\top QA) s+a^\top (R+B^\top QB) a+2s^\top A^\top Q Ba\)
- General minimization: \(\arg\min_a c + a^\top M a + 2m^\top a\)
- \(2Ma_\star + 2m = 0 \implies a_\star = -M^{-1} m\)
- \( \pi_{H-1}^\star(s)=-(R+B^\top QB)^{-1}B^\top QAs\)
- minimum is \(c-m^\top M^{-1} m\)
- \(V_{H-1}^\star(s) = s^\top (Q+A^\top QA - A^\top QB(R+B^\top QB)^{-1}B^\top QA) s\)
- \(2Ma_\star + 2m = 0 \implies a_\star = -M^{-1} m\)
DP: \(V_t^\star (s) = \min_{a} c(s, a)+V_{t+1}^\star (f(s,a))\)
PollEV
Important background:
- The gradient of a function \(f:\mathbb R^d \to\mathbb R\) is the vector $$\nabla f(x) = \begin{bmatrix}\frac{\partial f}{\partial x_1} \\ \vdots \\ \frac{\partial f}{\partial x_n}\end{bmatrix}$$
- If \(f\) has a minimum at \(x_\star\) then $$\nabla f(x_\star) = 0$$
- The gradient of quadratic and linear functions are $$\nabla \left[x^\top Mx\right]=Mx+M^\top x,\quad \nabla \left[m^\top x\right] = m $$
LQR via DP
- \(V_H^\star(s) = s^\top Q s\)
- \(t=H-1\): \(\quad \min_{a} s^\top Q s+a^\top Ra+ (As+Ba)^\top Q (As+Ba)\)
- \( \pi_{H-1}^\star(s)=-(R+B^\top QB)^{-1}B^\top QAs\)
- \(V_{H-1}^\star(s) = s^\top (Q+A^\top QA - A^\top QB(R+B^\top QB)^{-1}B^\top QA) s\)
Theorem: For \(t=0,\dots ,H-1\), the optimal value function is quadratic and the optimal policy is linear$$V^\star_t (s) = s^\top P_t s \quad\text{ and }\quad \pi_t^\star(s) = K_t s$$
where the matrices are defined as \(P_{H} = Q\) and
- \(P_t = Q+A^\top P_{t+1}A - A^\top P_{t+1}B(R+B^\top P_{t+1}B)^{-1}B^\top P_{t+1}A\)
- \(K_t = -(R+B^\top P_{t+1}B)^{-1}B^\top P_{t+1}A\)
LQR Proof
- Base case: \(V_H^\star(s) = s^\top Q s\)
- Inductive step: Assume that \(V^\star_{t+1} (s) = s^\top P_{t+1} s\).
- DP at \(t\): \(V_t^\star(s)= \min_{a} s^\top Q s+a^\top Ra+ (As+Ba)^\top P_{t+1} (As+Ba)\)
- \(\quad \min_{a} s^\top (Q+A^\top P_{t+1}A) s+a^\top (R+B^\top P_{t+1} B) a+2s^\top A^\top P_{t+1} Ba\)
- General minimization: \(\arg\min_a c + a^\top M a + 2m^\top a\) gives \(a_\star = -M^{-1} m\) and minimum is \(c-m^\top M^{-1} m\)
- \( \pi_{t}^\star(s)=-(R+B^\top P_{t+1}B)^{-1}B^\top P_{t+1}As\)
- \(V_{t}^\star(s) = s^\top (Q+A^\top P_{t+1}A - A^\top P_{t+1}B(R+B^\top P_{t+1}B)^{-1}B^\top P_{t+1}A) s\)
Theorem: \(V^\star_t (s) = s^\top P_t s\) and \(\pi_t^\star(s) = K_t s\) where \(P_{H} = Q\),
\(P_t = Q+A^\top P_{t+1}A - A^\top P_{t+1}B(R+B^\top P_{t+1}B)^{-1}B^\top P_{t+1}A\)
\(K_t = -(R+B^\top P_{t+1}B)^{-1}B^\top P_{t+1}A\)
Example
- Setting: hovering UAV over a target
- Action: thrust right/left
- State: distance from target, velocity
- LQR\(\left(\begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix},\begin{bmatrix}0\\ 1\end{bmatrix},\begin{bmatrix}1&0\\ 0&0\end{bmatrix},\frac{1}{2}\right)\)
\(a_t\)
\(\pi_t^\star(s) = \begin{bmatrix}{ \gamma^\mathsf{pos}_t }& {\gamma_t^\mathsf{vel}} \end{bmatrix}s = \gamma^\mathsf{pos}_t (\mathsf{pos} - x) + \gamma^\mathsf{vel}_t \mathsf{vel} \)
\(\gamma^\mathsf{pos}\)
\(\gamma^\mathsf{vel}\)
\(-1\)
\(t\)
\(H\)
LQR Extensions
- The same dynamic programming method extends in a straightforward manner when:
- Dynamics and costs are time varying
- Affine term in the dynamics, cross terms in the costs
- General form: \( f_t(s_t,a_t) = A_ts_t + B_t a_t +c_t\) and $$c_t(s,a) = s^\top Q_ts+a^\top R_ta+a^\top M_ts + q_t^\top s + r_t^\top a+ v_t $$
- General solution: \(\pi^\star_t(s) = K_t s+ k_t\) where $$\{K_t,k_t\}_{t=0}^{H-1} = \mathsf{LQR}(\{A_t,B_t,c_t, Q_t, R_t, M_t, q_t, r_t, v_t\}_{t=0}^{H-1}) $$
- Many applications can be reformulated this way:
- e.g. trajectory tracking \(c_t(s,a) = \|s-\bar s_t\|_2^2 + \|a\|_2^2\) for given \(\bar s_t\)
- Nonlinear dynamics and costs (Programming Assignment 2)
Agenda
1. Recap: Control & LQR
2. Optimal LQR Policy
3. Nonlinear Approximation
4. Local Linear Control
Example
- Setting: hovering UAV over a target
- Action: thrust right/left
- imperfect: attenuated at high thrusts and velocities
- The dynamics:
- \(\mathsf{position}_{t+1} = \mathsf{position}_{t}+ \mathsf{velocity}_{t}\)
- \(\mathsf{velocity}_{t+1}=\mathsf{velocity}_{t} + e^{- (\mathsf{velocity}_t^2+a_t^2)} a_t\)
- When velocity/thrust is:
- small, then \(\mathsf{velocity}_{t+1}\approx \mathsf{velocity}_{t} +a_t \)
- large, then \(\mathsf{velocity}_{t+1}\approx \mathsf{velocity}_{t} \)
\(a_t\)
Example
- Setting: hovering UAV over a target
- Action: thrust right/left
- imperfect: attenuated at high thrusts and velocities
- Goal: stay near target position \(0\)
- Field of view is limited
- Thus cost is $$c(s,a) =(1-e^{-\mathsf{pos}^2}) +\lambda a^2$$
\(a_t\)
Low-Order Approximation
- How to find simpler (e.g. linear or quadratic) approximations?
- For a nonlinear differentiable function \(g:\mathbb R\to\mathbb R\)
- Recall Taylor Expansion $$ g(x) = g(x_0) +g'(x_0)(x-x_0)+\frac{1}{2}g''(x_0)(x-x_0)^2 + ... $$
- When \(x\) is close to \(x_0\), the higher order terms become vanishingly small: \(\epsilon^p\to 0\) as \(p\to\infty\) for \( |\epsilon|<1\)

Linear Approximation
- Linear, also called first-order, approximation $$ g(x) \approx g(x_0) + g'(x_0)(x-x_0) $$
- For vector-valued multi-variate function \(f:\mathbb R^{n_s}\times \mathbb R^{n_a} \to \mathbb R^{n_s}\) $$ f(s,a) \approx f(s_0, a_0) + \nabla_s f(s_0, a_0)^\top (s-s_0) + \nabla_a f(s_0, a_0)^\top (a-a_0) $$
- Jacobians \( \nabla_s f(s, a) \in\mathbb R^{n_s\times n_s}\) and \( \nabla_a f(s, a) \in\mathbb R^{n_a\times n_s}\) contain:
- row \(i\) represents effects of \(i\)th dimension of current state/action, col \(j\) represents effects on \(f_j\), i.e. \(j\)th dimension of next state
\( \frac{\partial f_j (s,a)}{\partial s_i}\)
\(i\)
\(j\)
\( \frac{\partial f_j (s,a)}{\partial a_i}\)
\(i\)
\(j\)
Example
- Setting: hovering UAV over a target
- state \(s = [\mathsf{pos}, \mathsf{vel}]\)
- The dynamics: $$ f(s_t, a_t) = \begin{bmatrix} \mathsf{pos}_{t}+ \mathsf{vel}_{t}\\ \mathsf{vel}_{t} + e^{- (\mathsf{vel}_t^2+a_t^2)} a_t \end{bmatrix}\qquad $$
- \(= \begin{bmatrix} 1 & 0 \\ 1 & 1-2a\mathsf{vel}e^{-(\mathsf{vel}^2+a^2)} \end{bmatrix} \)
- \(\nabla_a f(s,a) = \begin{bmatrix} \frac{\partial f_1 (s,a)}{\partial a} & \frac{\partial f_2 (s,a)}{\partial a} \end{bmatrix} \)
- \(=\begin{bmatrix} 0 & (1-2a^2) e^{-(\mathsf{vel}^2+a^2)} \end{bmatrix}\)
\(a_t\)
$$\nabla_s f(s,a) = \begin{bmatrix} \frac{\partial f_1 (s,a)}{\partial \mathsf{pos}} & \frac{\partial f_2 (s,a)}{\partial \mathsf{pos}} \\ \frac{\partial f_1 (s,a)}{\partial \mathsf{vel}} & \frac{\partial f_2 (s,a)}{\partial \mathsf{vel}} \end{bmatrix} $$
\(=\begin{bmatrix} f_1(s,a)\\f_2(s,a)\end{bmatrix}\)
Quadratic Approximation
- Second-order approximation $$ g(x) \approx g(x_0) + g'(x_0)(x-x_0) + \frac{1}{2} g''(x_0)(x-x_0)$$
- For multi-variate function \(c:\mathbb R^{n_s}\times \mathbb R^{n_a} \to \mathbb R\) $$ c(s,a) \approx c(s_0, a_0) + \nabla_s c(s_0, a_0)^\top (s-s_0) + \nabla_a c(s_0, a_0)^\top (a-a_0) + \\ \frac{1}{2} (s-s_0) ^\top \nabla^2_s c(s_0, a_0)(s-s_0) + \frac{1}{2} (a-a_0) ^\top \nabla^2_a c(s_0, a_0)(a-a_0) \\+ (a-a_0) ^\top \nabla_{as}^2 c(s_0, a_0)(s-s_0) $$
- Gradients \( \nabla_s c(s, a) \in\mathbb R^{n_s}\) and \( \nabla_a c(s, a) \in\mathbb R^{n_a}\)
- Hessians \( \nabla_s^2 c(s, a) \in\mathbb R^{n_s\times n_s}\), \( \nabla_a^2 c(s, a) \in\mathbb R^{n_a \times n_a}\), and \( \nabla_{as}^2 c(s, a) \in\mathbb R^{n_a\times n_s}\) contain second derivatives
Quadratic Approximation
- For multi-variate function \(c:\mathbb R^{n_s}\times \mathbb R^{n_a} \to \mathbb R\)
- Gradients \( \nabla_s c(s, a) \in\mathbb R^{n_s}\) and \( \nabla_a c(s, a) \in\mathbb R^{n_a}\)
- entry \(i\) represents effect of \(i\)th dimension of current state/action
- Gradients \( \nabla_s c(s, a) \in\mathbb R^{n_s}\) and \( \nabla_a c(s, a) \in\mathbb R^{n_a}\)
\( \frac{\partial c (s,a)}{\partial s_i}\)
\( \frac{\partial c (s,a)}{\partial a_i}\)
\(i\)
\(i\)
Quadratic Approximation
- For multi-variate function \(c:\mathbb R^{n_s}\times \mathbb R^{n_a} \to \mathbb R\)
- Gradients \( \nabla_s c(s, a) \in\mathbb R^{n_s}\) and \( \nabla_a c(s, a) \in\mathbb R^{n_a}\)
- Hessians \( \nabla_s^2 c(s, a) \in\mathbb R^{n_s\times n_s}\), \( \nabla_a^2 c(s, a) \in\mathbb R^{n_a \times n_a}\), \( \nabla_{as}^2 c(s, a) \in\mathbb R^{n_a\times n_s}\)
\( \frac{\partial^2 c (s,a)}{\partial s_i\partial s_j}\)
\( \frac{\partial^2c(s,a)}{\partial a_i\partial a_j}\)
\( \frac{\partial^2 c (s,a)}{\partial a_i \partial s_j}\)
\(i\)
\(i\)
\(i\)
\(j\)
\(j\)
\(j\)
symmetric
Example
- Setting: hovering UAV over a target
- state \(s = [\mathsf{pos}, \mathsf{vel}]\)
- The cost: $$c(s,a) = (1-e^{-\mathsf{pos}^2}) +\lambda a^2$$
- \(\nabla_s c(s,a)= \begin{bmatrix} 2\mathsf{pos}\cdot e^{-\mathsf{pos}^2} \\ 0 \end{bmatrix} \)
- \(\nabla_s^2 c(s,a)= \begin{bmatrix} 2(1-2\mathsf{pos}^2) e^{-\mathsf{pos}^2} & 0\\ 0& 0 \end{bmatrix} \)
- \(\nabla_a c(s,a)= 2\lambda a\) and \(\nabla_a^2 c(s,a)= 2\lambda\)
- \(\nabla_{as}^2 c(s,a)=0\)
\(a_t\)
Finite Difference Approximation
- For scalar function $$g'(x) \approx \frac{g(x+\delta)-g(x-\delta)}{2\delta}$$
- For multivariate $$ \frac{\partial f_j (s,a)}{\partial s_i} \approx \frac{f_j(s+\delta e_i,a)-f_j(s-\delta e_i,a)}{2\delta}$$ where \(e_i\) is a standard basis vector
- For second derivatives, repeat
$$\frac{\partial c (s,a)}{\partial a_i \partial s_j} \approx \frac{1}{2\delta}\Big[ \frac{c(s+\delta e_j,a +\delta e_i)- c(s-\delta e_j,a +\delta e_i)}{2\delta} \\- \frac{c(s+\delta e_j,a -\delta e_i)-c(s-\delta e_j,a -\delta e_i)}{2\delta} \Big]$$
$$\frac{\partial c (s,a)}{\partial a_i \partial s_j} \approx \frac{1}{2\delta}\Big[ \frac{\partial c (s,a +\delta e_i)}{\partial s_j} - \frac{\partial c (s,a -\delta e_i)}{\partial s_j} \Big]$$
Agenda
1. Recap: Control & LQR
2. Optimal LQR Policy
3. Nonlinear Approximation
4. Local Linear Control
Local Control
- Local control around \((s_\star,a_\star)\)
- e.g. Cartpole (PA2)
- \(s = \begin{bmatrix} \theta\\ \omega \\ x \\ f \end{bmatrix}\) and \(a = f\)
- goal: balance \(s_\star = 0\) and \(a_\star = 0\)
- e.g. Cartpole (PA2)
- Applicable when costs \(c\) are smallest at \((s_\star,a_\star)\) and initial state is close to \(s_\star\)
angle \(\theta\)
angular velocity \(\omega\)
gravity
position \(x\)
force \(f\)
velocity \(v\)
- Assumptions:
- Black-box access to \(f\) and \(c\)
- i.e. can query at any \((s,a)\) and observe outputs \(s'\) and \(c\) where \(s'=f(s,a)\) and \(c=c(s,a)\)
- \(f\) is differentiable and \(c\) is twice differentiable
- i.e. Jacobians and Hessians are well defined
- Black-box access to \(f\) and \(c\)
minimize \(\displaystyle\sum_{t=0}^{H-1} c(s_t, a_t)\)
s.t. \(s_{t+1}=f(s_t, a_t), ~~a_t=\pi_t(s_t)\)
\(\pi\)
Local Control
- Procedure
- Approximate dynamics & costs
- First/second order approximation
- Finite differencing
- Policy via LQR
- Approximate dynamics & costs
minimize \(\displaystyle\sum_{t=0}^{H-1} c(s_t, a_t)\)
s.t. \(s_{t+1}=f(s_t, a_t), ~~a_t=\pi_t(s_t)\)
\(\pi\)
Local Control
Linearized Dynamics
- Linearization of dynamics around \((s_0,a_0)\)
- \( f(s,a) \approx f(s_0, a_0) + \nabla_s f(s_0, a_0)^\top (s-s_0) + \nabla_a f(s_0, a_0)^\top (a-a_0) \)
- \( =A_0s+B_0a+c_0 \)
- where the matrices depend on \((s_0,a_0)\):
- \(A_0 = \nabla_s f(s_0, a_0)^\top \)
- \(B_0 = \nabla_a f(s_0, a_0)^\top \)
- \(c_0 = f(s_0, a_0) - \nabla_s f(s_0, a_0)^\top s_0 - \nabla_a f(s_0, a_0)^\top a_0 \)
- Black box access: use finite differencing to compute
Example
- Setting: hovering UAV over a target
- state \(s = [\mathsf{pos}, \mathsf{vel}]\)
- Linearizing around \((0,0)\)
- \(f(0,0) = 0\)
- \(\nabla_s f(0,0) = \begin{bmatrix} 1 & 0 \\ 1 & 1-2\cdot 0\cdot e^{-0} \end{bmatrix} \)
- \(\nabla_a f(0,0) =\begin{bmatrix} 0 & (1-0) e^{-0} \end{bmatrix}\)
- \(s_{t+1}=f(s_t, a_t) \approx \begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix}s_t + \begin{bmatrix}0\\ 1\end{bmatrix}a_t\)
\(a_t\)
Second-Order Approx. Costs
- Approximate costs around \((s_0,a_0)\) $$ c(s,a) \approx c(s_0, a_0) + \nabla_s c(s_0, a_0)^\top (s-s_0) + \nabla_a c(s_0, a_0)^\top (a-a_0) + \\ \frac{1}{2} (s-s_0) ^\top \nabla^2_s c(s_0, a_0)(s-s_0) + \frac{1}{2} (a-a_0) ^\top \nabla^2_a c(s_0, a_0)(a-a_0) \\+ (a-a_0) ^\top \nabla_{as}^2 c(s_0, a_0)(s-s_0) $$
- \( =s^\top Q_0s+a^\top R_0a+a^\top M_0s + q_0^\top s + r_0^\top a+ v_0\)
- Practical consideration:
- Force \(Q_0,R_0\) to be positive definite by setting negative eigenvalues to 0 and adding regularization \(\lambda I\)
- Black box access: use finite differencing to compute
For a symmetric matrix \(Q\in\mathbb R^{n\times n}\) the eigen-decomposition is $$Q = \sum_{i=1}^n v_iv_i^\top \sigma_i $$
To make this PSD, we replace $$Q\leftarrow \sum_{i=1}^n v_iv_i^\top (\max\{0,\sigma_i\} +\lambda)$$


Practical Consideration
Example
- Setting: hovering UAV over a target
- state \(s = [\mathsf{pos}, \mathsf{vel}]\)
- Linearizing around \((0,0)\)
- \(\nabla_s c(0,0)= \begin{bmatrix} 0 \\ 0 \end{bmatrix} \)
- \(\nabla_s^2 c(0,0)= \begin{bmatrix} 2 & 0\\ 0& 0 \end{bmatrix} \)
- \(\nabla_a c(0,0)= 0\) and \(\nabla_a^2 c(0,0)= 2\lambda\)
- \(\nabla_{as}^2 c(0,0)=0\)
- \(\nabla_s c(0,0)= \begin{bmatrix} 0 \\ 0 \end{bmatrix} \)
- \(c(s,a)\approx \mathsf{pos}^2 + \lambda a^2\)
\(a_t\)
- Approximate dynamics & costs
- Linearize \(f\) as \(A_0,B_0,c_0\)
- Approx \(c\) as \(Q_0,R_0,M_0,q_0,r_0,v_0\)
- LQR policy: \(\pi^\star_t(s) = K_t s+ k_t\) where $$\{K_t,k_t\}_{t=0}^{H-1} = \mathsf{LQR}(A_0,B_0,c_0, Q_0, R_0, M_0, q_0, r_0, v_0) $$
- works as long as states and actions remain close to \(s_\star\) and \(a_\star\)
Local Control
minimize \(\displaystyle\sum_{t=0}^{H-1} c(s_t, a_t)\)
s.t. \(s_{t+1}=f(s_t, a_t), ~~a_t=\pi_t(s_t)\)
\(\pi\)
Local Control as Approx DP
- Initialize \(V^\star_H(s) = c_H(s)\)
- For \(t=H-1, H-2, ..., 0\):
- \(Q_t^\star(s,a) = c(s,a)+V^\star_{t+1}(f(s,a))\)
- \(\pi_t^\star(s) = \arg\min_a Q_t^\star(s,a)\)
- \(V^\star_{t}(s)=Q_t^\star(s,\pi_t^\star(s) )\)
- Return \(\pi^\star = (\pi^\star_0,\dots ,\pi^\star_{H-1})\)
Recap
- PSet due Wednesday
- Optimal LQR Policy
- Nonlinear Approximation
- Locally Linear Control
- Next lecture: Iterative Nonlinear Control
Sp23 CS 4/5789: Lecture 9
By Sarah Dean