High performance in the real world involves complex dynamics and safety constraints
minimize \(\mathbb{E}[\)cost\((x_0,u_0,x_1...)]\)
s.t. \(x_{t+1} = Ax_t + Bu_t + w_t\)
\(x_t\in\mathcal X,~~u_t\in\mathcal U\) for all \(t\)
Initial Estimates
\(\|\widehat A_0 - A\|\leq \epsilon_A\), \(\|\widehat B_0 - B\|\leq \epsilon_B\)
Goal: Analyze learning and performance in the presence of state and input constraints
and all \(\|w_t\|\leq \sigma_w\)
Robust control
Persistent Excitation
Where \(M\) is the safety margin cost gap of \(\mathbf{K}_*\), as long as \(T\) is large enough,
rel. error of \(\widehat{\mathbf K}\lesssim \frac{\sigma_w C_u}{\sigma_\eta} \sqrt{\frac{n+d}{T}} \|\)CL\((A,B,\mathbf K_*)\|_{H_\infty} (1+M) +M\)
SNR \(=\frac{\text{process noise}}{\text{excitation}}\)
sample complexity
safety margin optimal cost gap
robustness of optimal controller
Where \(M\) is the safety margin cost gap of \(\mathbf{K}_*\), as long as \(T\) is large enough,
rel. error of \(\widehat{\mathbf K}\lesssim \frac{\sigma_w C_u}{\sigma_\eta} \sqrt{\frac{n+d}{T}} \|\)CL\((A,B,\mathbf K_*)\|_{H_\infty} (1+M) +M\)
Where \(M\) is the safety margin cost gap of \(\mathbf{K}_*\), as long as \(T\) is large enough,
rel. error of \(\widehat{\mathbf K}\lesssim \frac{\sigma_w C_u}{\sigma_\eta} \sqrt{\frac{n+d}{T}} \|\)CL\((A,B,\mathbf K_*)\|_{H_\infty} (1+M) +M\)
Ingredients:
For stabilizing control of the form \(u_t = \mathbf{K}(x_t, x_{t-1}...) + \eta_t\) and large enough \(T\), we have w.p. \(1-\delta\)
\(\Big\|\begin{bmatrix} \widehat A - A \\ \widehat B - B\end{bmatrix}\Big\| \lesssim \frac{\sigma_w C_u}{\sigma_\eta} \sqrt{\frac{n+d }{T} \log(1/\delta)} \)
Least squares estimate \((\widehat A, \widehat B) \in \arg\min \sum_{t=0}^T \|Ax_t +B u_t - x_{t+1}\|^2 \)
where \(C_u\) is the gain from disturbance to control input
Assume that process noise \(w_t\) and excitation \(\eta_t\) are zero mean, independent over time, and with fourth moments bounded by \(\sigma_w\) and \(\sigma_\eta\)
Instead of reasoning about a controller \(\mathbf{K}\),
plant \((A,B)\)
controller \(\mathbf{K}\)
\(\bf x\)
\(\bf u\)
\(\bf w\)
we reason about the interconnection \(\mathbf\Phi\) directly.
This correspondence holds for all \(\mathbf\Phi\) constrained to lie in an affine space defined by the true dynamics
\(\begin{bmatrix}zI- A&- B\end{bmatrix} \mathbf\Phi = I\)
quadratic cost
polytope constraints
achievable subspace
minimize \(\mathbb{E}[\)cost\((x_0,u_0,x_1...)]\)
s.t. \(x_{t+1} = Ax_t + Bu_t + w_t\)
\(x_t\in\mathcal X,~~u_t\in\mathcal U\) for all \(t\)
and all \(\|w_t\|\leq \sigma_w\)
minimize cost(\(\mathbf{\Phi}\))
s.t. \(\begin{bmatrix}zI- A&- B\end{bmatrix} \mathbf\Phi = I\)
\( \mathbf\Phi\in\) constraints\(_{\sigma_w}\)(\(\mathcal{X},\mathcal{U}\))
robust cost
tightened polytope constraints
nominal achievable subspace
\( \underset{\mathbf u=\mathbf{Kx}}{\min}\) \(\underset{\|A-\widehat A\|\leq \varepsilon_A \atop \|B-\widehat B\|\leq \varepsilon_B}{\max}\) \(\mathbb{E}[\)cost\((x_0,u_0,x_1...)]\)
s.t. \(x_{t+1} = Ax_t + Bu_t + w_t\)
\(x_t\in\mathcal X,~~u_t\in\mathcal U\) for all \(t\)
and all \(A, B, w_t\)
\( \underset{\mathbf{\Phi}}{\min}\) \(\frac{1}{1-\gamma}\)\(\text{cost}(\mathbf{\Phi})\)
\(\text{s.t.}~\begin{bmatrix}zI- \widehat A&- \widehat B\end{bmatrix} \mathbf\Phi = I\)
\(\|[{\varepsilon_A\atop ~} ~{~\atop \varepsilon_B}]\mathbf \Phi\|_{H_\infty}\leq\gamma,~ \|[{\varepsilon_A\atop ~} ~{~\atop \varepsilon_B}]\mathbf\Phi\|_{L_1}\leq\tau, \)
\( \mathbf\Phi\in \text{constraints}\)\(_{\sigma_w,\tau}\)(\(\mathcal{X},\mathcal{U})\)
sensitivity constraints
Using any feasible \(\mathbf K\) with \(0\leq\gamma,\tau\leq 1\) for learning results in a stable interconnection and satisfies the state and input constraints for any system in the uncertainty set.
\(\text{find}~\begin{bmatrix}zI- \widehat A_0&- \widehat B_0\end{bmatrix} \mathbf\Phi = I\)
\(\|[{\varepsilon_A\atop ~} ~{~\atop \varepsilon_B}]\mathbf \Phi\|_{H_\infty}\leq\gamma,~ \|[{\varepsilon_A\atop ~} ~{~\atop \varepsilon_B}]\mathbf\Phi\|_{L_1}\leq\tau, \)
\( \mathbf\Phi\in \text{constraints}\)\(_{\tilde\sigma_w,\tau}\)(\(\mathcal{X},\mathcal{U}_{\sigma_\eta})\)
\(\sigma_w+(\|\widehat B\|+\varepsilon_B)\sigma_\eta\)
The relative suboptimality is bounded as
\(\frac{\text{cost}(\widehat\mathbf{K})-\text{cost}(\mathbf{K}_*) }{\text{cost}(\mathbf{K}_*)}\leq 4\sqrt{2}(1+M) \|[{\varepsilon_A\atop ~} ~{~\atop \varepsilon_B}]\mathbf \Phi_*\|_{H_\infty}+M\)
where \(M\) is the safety margin sub-optimality gap of the optimal controller.
\(\underset{\gamma,\tau}{\min} ~\underset{\mathbf{\Phi}}{\min}\) \(\frac{1}{1-\gamma}\)\(\text{cost}(\mathbf{\Phi})\)
\(\text{s.t.}~~\begin{bmatrix}zI- \widehat A&- \widehat B\end{bmatrix} \mathbf\Phi = I\)
\(\|[{\varepsilon_A\atop ~} ~{~\atop \varepsilon_B}]\mathbf \Phi\|_{H_\infty}\leq\gamma,~ \|[{\varepsilon_A\atop ~} ~{~\atop \varepsilon_B}]\mathbf\Phi\|_{L_1}\leq\tau, \)
\( \mathbf\Phi\in \text{constraints}\)\(_{\sigma_w,\tau}\)(\(\mathcal{X},\mathcal{U})\)
Robust synthesis using estimated dynamics:
The double integrator dynamics \(x_{t+1} = \begin{bmatrix}1&0.1\\0&1\end{bmatrix}x_t + \begin{bmatrix}0\\1\end{bmatrix}u_t + w_t\)
Learning with \(u_t = \eta_t +\mathbf{K}_0(x_t, x_{t-1}...)\)
Controlling the system with \(\widehat{\mathbf{K}}\)
S. Dean, S. Tu, N. Matni, and B. Recht, Safely Learning to Control the Constrained Linear Quadratic Regulator. arXiv:1809.10121
Based on work supported by NSF Graduate Research Fellowship under Grant No. DGE 1752814
In the Gaussian case, the statistical bound comes from
\(\Big\|\begin{bmatrix} \widehat A - A \\ \widehat B - B\end{bmatrix}\Big\| \lesssim\sqrt{\frac{\sigma_w^2(n+d ) }{T\lambda_{\min}(\Sigma_{x,u})} } \)
Open loop Guassian inputs (result due to [Simchowitz et al. 2018])
where \(\Sigma_{x,u}= \sum_{k=0}^\infty A^k (\sigma_w^2 I + \sigma_u^2BB^\top)(A^k)^\top \)
Now with \(u_k = Kx_k + \eta_k\):
where \(\Sigma_{x,u}= \begin{bmatrix} \Sigma & \Sigma K^\top \\ K\Sigma & K\Sigma K^\top + \sigma_u^2 I\end{bmatrix}\)
with \(\Sigma= \sum_{k=0}^\infty (A+BK)^k (\sigma_w^2 I + \sigma_u^2BB^\top)((A+BK)^k)^\top \)
\(x_t = \sum_{k=0}^t (A-BK)^{k}w_{t-k}\)
\(u_t = \sum_{k=0}^t K(A-BK)^{k}w_{t-k}\)
\(x_t = \sum_{k=0}^t \Phi_x(t) w_{t-k}\)
\(u_t = \sum_{k=0}^t \Phi_u(t) w_{t-k}\)
The specific form of the constraints
constraints\(_{\sigma_w}\)(\(\mathcal{X}\))\( = \{ F_j^\top \Phi(k+1)x_0 + \sigma_w\|F_j^\top[\Phi(k) ~...~\Phi(1)]\|_1 \leq b_j ~~\forall~j,k\}\)
for \(\mathcal{X} = \{Fx\leq b\}\)
where \(F_j\) are rows of \(F\)
The robust constraint condition is instead
\( F_j^\top \Phi(k+1)x_0 + \sigma_w\|F_j^\top[\Phi(k) ~...~\Phi(1)]\|_1 + \max(\sigma_w,\|x_0\|_\infty)\frac{\tau}{1-\tau}\|F_j^\top[\Phi(k+1) ~...~\Phi(1)]\|_1 \leq b_j \)
Under the dynamics uncertainty, the system response is
\(\mathbf{\Phi}(1+\mathbf\Delta)^{-1}\mathbf w\)
where \(\mathbf \Delta = [\varepsilon_A~~\varepsilon_B]\mathbf\Phi\), so the robust synthesis constraints essentially come from considering expanded noise process \(\tilde{\mathbf w} = (1+\mathbf\Delta)^{-1}\mathbf w\)
\(\tilde{\mathbf w} = (1+\mathbf\Delta)^{-1}\mathbf w\)
\(= \mathbf w + \mathbf\Delta(1+\mathbf\Delta)^{-1}\mathbf w\)
In MPC, it is common to model the uncertainty in an additive disturbance manner, \(\tilde w_k = \Delta_A x_k + \Delta_B u_k + w_k\)
\(\tilde{\mathbf w} = \mathbf w + \Delta_A \mathbf x + \Delta_B \mathbf u \)
vs.