Sarah Dean PRO
asst prof in CS at Cornell
Prof Sarah Dean
\(y_t\)
\(\hat s_{t+1} = A\hat s_t + L_t(y_{t+1} -CA\hat s_t)\)
\(s_{t+1} = As_t + w_t\)
\(y_t = Cs_t + v_t\)
Kalman Filter
\(\hat s_t\)
\(A-L_tCA\)
\(\hat s\)
\(L_t\)
\(s\)
\(w_t\)
\(v_t\)
\(C\)
$$ \min_{\textcolor{yellow}{\theta}} \sum_{k=0}^t (y_t - \textcolor{yellow}{\theta^\top} x_t )^2 $$
\(y_0 = \textcolor{yellow}{\theta^\top} x_0 + \textcolor{yellow}{v_0}\)
\(\vdots\)
\(y_t = \textcolor{yellow}{\theta^\top} x_t + \textcolor{yellow}{v_t}\)
$$ \min_{\textcolor{yellow}{\theta, v_k}} \sum_{k=0}^t \textcolor{yellow}{v_k}^2 $$
s.t.
\(y_0 = \textcolor{yellow}{\theta_0^\top} x_0 + \textcolor{yellow}{v_0}\)
\(\vdots\)
\(y_t = \textcolor{yellow}{\theta_t^\top} x_t + \textcolor{yellow}{v_t}\)
$$ \min_{\textcolor{yellow}{\theta_k, v_k, w_k}} \sum_{k=0}^t \textcolor{yellow}{v_k^2+\|w_k\|_2^2 + \|\theta_0\|_2^2} $$
\(\theta_1 = A_0\textcolor{yellow}{\theta_0}+\textcolor{yellow}{w_0}\)
\(\vdots\)
\(\theta_t = A_{t-1}\textcolor{yellow}{\theta_{t-1}}+\textcolor{yellow}{w_{t-1}}\)
s.t.
\(y_0 = C_0\textcolor{yellow}{s_0} + \textcolor{yellow}{v_0}\)
\(\vdots\)
\(y_t = C_t\textcolor{yellow}{s_t} + \textcolor{yellow}{v_t}\)
\(\theta_1 = A_0\textcolor{yellow}{s_0}+\textcolor{yellow}{w_0}\)
\(\vdots\)
\(\theta_t = A_{t-1}\textcolor{yellow}{s_{t-1}}+\textcolor{yellow}{w_{t-1}}\)
s.t.
$$ \min_{\textcolor{yellow}{s_k, v_k, w_k}} \sum_{k=0}^t \textcolor{yellow}{\|v_k\|^2+\|w_k\|_2^2 + \|s_0\|_2^2} $$
$$ \min_{\textcolor{yellow}{s_k}} \sum_{k=0}^t \|C_k\textcolor{yellow}{s_k}-y_k\|_2^2+ \|A_k\textcolor{yellow}{s_k-s_{k+1}}\|_2^2+ \|\textcolor{yellow}{s_0}\|_2^2$$
Example
The state \(s=[\theta,\omega]\), output \(y=\theta\), and
$$\theta_{t+1} = 0.9\theta_t,\quad \omega_{t+1} = 0.9 \omega_t$$
Related Goals
Challenges
A deterministic system is observable if outputs \(y_{0:t}\) uniquely determine the state trajectory \(s_{0:t}\).
\(s_{t+1} = F(s_t)\)
\(y_t = G(s_t)\)
\((y_0, y_1,...) = \Phi_{F,G}[s_0]\)
A linear system is observable if and only if
$$\mathrm{rank}\Big(\underbrace{\begin{bmatrix}C\\CA\\\vdots \\ CA^{n-1}\end{bmatrix}}_{\mathcal O}\Big) = n$$
\(s_{t+1} = As_t\)
\(y_t = Cs_t\)
\(\begin{bmatrix} y_0\\y_1\\\vdots\\y_t\end{bmatrix}=\)
A linear system is observable if and only if
$$\mathrm{rank}\Big(\underbrace{\begin{bmatrix}C\\CA\\\vdots \\ CA^{n-1}\end{bmatrix}}_{\mathcal O}\Big) = n$$
\(\begin{bmatrix} Cs_0\\CAs_0\\\vdots\\CA^ts_0\end{bmatrix}\)
1) \(\mathrm{rank}(\mathcal O) = n \implies\) observable
$$\begin{bmatrix} y_0 \\ \vdots \\ y_{n-1}\end{bmatrix} = \begin{bmatrix}C\\CA\\\vdots \\ CA^{n-1}\end{bmatrix} s_0 $$
2) \(\mathrm{rank}(\mathcal O) = n \impliedby\) observable
Proof:
1) \(\mathrm{rank}(\mathcal O) = n \implies\) observable
$$\begin{bmatrix} y_0 \\ \vdots \\ y_{n-1}\end{bmatrix} = \begin{bmatrix}C\\CA\\\vdots \\ CA^{n-1}\end{bmatrix} s_0 $$
2) \(\mathrm{rank}(\mathcal O) < n \implies\) not observable
\(\begin{bmatrix} y_0\\y_1\\\vdots\\y_t\end{bmatrix}=\underbrace{\begin{bmatrix} C\\CA\\\vdots\\CA^t\end{bmatrix}}_{\mathcal O_t} s_0\)
Proof:
If \((A,C)\) is observable, then as \(t\to\infty\) the Kalman gain converges to \(P,L\) satisfying
\(P = A\left(P-PC^\top (CPC^\top +\Sigma_v)^{-1} CP\right)A^\top + \Sigma_w\)
\(L = PC^\top (\Sigma_v+CPC^\top)^{-1}\)
\(y_t\)
Kalman Filter
\(A-LCA\)
\(\hat s\)
\(L\)
\(\hat s_{t} = A\hat s_{t-1} + L(y_{t} -CA\hat s_{t-1})\)
Discrete Algebraic Riccati Equation
If \((A,C)\) is observable, then as \(t\to\infty\) the Kalman gain converges to \(P,L\) satisfying
\(P, L = \mathrm{DARE}(A^\top, C^\top ,\Sigma_w, \Sigma_v)\)
\(y_t\)
Kalman Filter
\(A-LCA\)
\(\hat s\)
\(L\)
\(\hat s_{t} = A\hat s_{t-1} + L(y_{t} -CA\hat s_{t-1})\)
\(y_t\)
Kalman Filter
\(A-LCA\)
\(\hat s\)
\(L\)
\(\hat s_{t} = A\hat s_{t-1} + L\varepsilon_t\)
\(y_{t} =CA\hat s_{t-1}+\varepsilon_t\)
\(A\)
\(s\)
\(w_t\)
\(v_t\)
\(C\)
Interconnection of the world (linear dynamics) with a filter (linear dynamics)
Exercise: show that for some \(C_\varepsilon, D_\varepsilon, A_e, D_e\),
the innovations are given by \(\varepsilon_t = C_\varepsilon e_t+ D_\varepsilon v_t\) where
\(e_{t+1} =A_ee_{t} -w_{t}+D_e v_t\) and \(e_0 = \hat s_0- s_0\)
\(\varepsilon_t\)
Kalman Filter
\(A\)
\(\hat s\)
\(L\)
\(\hat s_{t} = A\hat s_{t-1} + L\varepsilon_t\)
\(y_{t} =CA\hat s_{t-1}+\varepsilon_t\)
\(A_e\)
\(e\)
\(w_t\)
\(v_t\)
\(C_\varepsilon\)
Exercise: show that for some \(C_\varepsilon, D_\varepsilon, A_e, D_e\),
the innovations are given by \(\varepsilon_t = C_\varepsilon e_t+ D_\varepsilon v_t\) where
\(e_{t+1} =A_ee_{t} -w_{t}+D_e v_t\) and \(e_0 = \hat s_0- s_0\)
Interconnection of the world (linear dynamics) with a filter (linear dynamics)
\(CA\)
What if we don't know exactly how the system evolves or how measurements are generated?
Related Goals
\(y_t\)
\(s\)
\(w_t\)
\(v_t\)
\(?\)
What if we don't know exactly how the system evolves?
Related Goals
\(s_t\)
\(s\)
\(w_t\)
What if we don't know exactly how the system evolves?
What if we don't know exactly how the system evolves?
$$\hat F = \min_{\textcolor{yellow}{F}\in\mathcal F} ~~\sum_{k=0}^{t-1}\|s_{k+1} - \textcolor{yellow}{F}(s_k)\|^2 $$
Use supervised learning with labels \(s_{k+1}\) and features \(s_k\)!
$$\hat A = \min_{A} ~~\sum_{k=0}^{t-1}\|s_{k+1} - As_k\|^2 $$
$$= \min_{A} \Big\|\underbrace{\begin{bmatrix} s_1^\top \\ \vdots \\ s_{t}^\top\end{bmatrix}}_{S_+} - \underbrace{\begin{bmatrix} s_0^\top \\ \vdots \\ s_{t-1}^\top\end{bmatrix}}_{S_-}A^\top\Big\|_F^2 $$
$$\hat A = S_+^\top S_-(S_-^\top S_-)^{-1} $$
$$\hat A = S_+^\top S_-(S_-^\top S_-)^{-1} $$
Example
Suppose there is no noise and we observe \(s_0 = [0, 1]\), \(s_1 = [\frac{1}{2}, 0]\), and \(s_2 = [0, \frac{1}{4}]\). What is \(s_3\)? \(A\)?
What if \(s_0 = [0,1]\), \(s_1 = [0, \frac{1}{2}]\), and \(s_2 = [0, \frac{1}{4}]\)?
$$\hat A = S_+^\top S_-(S_-^\top S_-)^{-1} $$
Even if \((A,C)\) is observable, non-unique solution!
What if we don't know exactly how the system evolves or how measurements are generated?
Example
Even if \((A,C)\) is observable, non-unique solution!
Suppose \(\hat A, \hat C, \hat s_0\) satisfies the equations
Then so does \(\tilde A, \tilde C, \tilde s_0\)
The state \(s_t=[\theta_t,\omega_t]\), output \(y_t=\theta_t\), and
$$\theta_{t+1} = 0.9\theta_t+0.1\omega_t,\quad \omega_{t+1} = 0.9 \omega_t$$
Then equivalent by having \(y_t =\begin{bmatrix}\frac{1}{\alpha} & 0\end{bmatrix} \tilde s_t\)
$$\tilde \theta_{t+1} = 0.9\tilde \theta_t+0.1\frac{\alpha}{\beta}\tilde \omega_t,\quad \tilde \omega_{t+1} = 0.9 \tilde \omega_t$$
What if we define \(\tilde s_0 = [\alpha\theta_0, \beta \omega_0]\) ?
The state \(s_t=[\theta_t,\omega_t]\), output \(y_t=\theta_t\), process noise \(w_t\), and
$$\theta_{t+1} = 0.9\theta_t+0.1\omega_t,\quad \omega_{t+1} = 0.9 \omega_t + w_t$$
Then equivalent by having \(y_t =\begin{bmatrix}\frac{1}{\alpha} & 0\end{bmatrix} \tilde s_t\)
$$\tilde \theta_{t+1} = 0.9\tilde \theta_t+0.1\frac{\alpha}{\beta}\tilde \omega_t,\quad \tilde \omega_{t+1} = 0.9 \tilde \omega_t+?$$
What if we define \(\tilde s_0 = [\alpha\theta_0, \beta \omega_0]\) ?
Set of equivalent state space representations for all invertible and square \(M\)
\(s_{t+1} = As_t + Bw_t\)
\(y_t = Cs_t+v_t\)
\(\tilde s_{t+1} = \tilde A\tilde s_t + \tilde B w_t\)
\(y_t = \tilde C\tilde s_t+v_t\)
\(\tilde s = M^{-1}s\)
\(\tilde A = M^{-1}AM\)
\(\tilde B = M^{-1}B\)
\(\tilde C = CM\)
\( s = M\tilde s\)
\( A = M\tilde AM^{-1}\)
\( B = M\tilde B\)
\(C = \tilde CM^{-1}\)
\(y_t\)
\(A\)
\(s\)
\(w_t\)
\(v_t\)
\(C\)
\(B\)
\(y_t\)
\(\tilde A\)
\(s\)
\(w_t\)
\(v_t\)
\(\tilde C\)
\(\tilde B\)
Goal: identify dynamics \(A\) and measurement model \(C\) up to similarity transform
$$\mathcal O_t = \begin{bmatrix}C\\CA\\\vdots \\ CA^{t}\end{bmatrix} $$
Recall the observability matrix
$$\tilde \mathcal O_t = \begin{bmatrix}\tilde C\\\tilde C\tilde A\\\vdots \\ \tilde C\tilde A^{t}\end{bmatrix} $$
$$=\begin{bmatrix}CM\\CMM^{-1} AM\\\vdots \\ CMM^{-1} A^{t}M\end{bmatrix} $$
$$=\mathcal O_t M $$
Approach: subspace identification of the rank \(n\) column space of \(\mathcal O_t\)
Step 1: Estimate \(\mathcal O_t\)
$$\begin{bmatrix}y_0\\\vdots \\ y_t\end{bmatrix} =\mathcal O_t s_0 $$
$$\begin{bmatrix}y_1\\\vdots \\ y_{t+1}\end{bmatrix} =\mathcal O_t s_{1} $$
\(\dots\)
$$\begin{bmatrix}y_k\\\vdots \\ y_{t+k}\end{bmatrix} =\mathcal O_t s_{k} $$
Step 1: Estimate \(\mathcal O_t\)
$$\begin{bmatrix}y_0 & \dots &y_k\\\vdots & \ddots&\vdots\\ y_t&\dots& y_{t+k}\end{bmatrix} =\mathcal O_t \begin{bmatrix}s_0 & s_1 &\dots & s_k\end{bmatrix}$$
As long as \(s_{0:k}\) form a rank \(n\) matrix, the column space of \(\mathcal O_t\) is the same as the column space of the matrix of outputs (Hankel matrix)
\(\hat{\mathcal O}_t = U_n\)
where the SVD is given by \(Y_{t,k} = \begin{bmatrix}U_n & U_{-1}\end{bmatrix} \begin{bmatrix}\Sigma_n \\& 0\end{bmatrix}V^\top \)
Step 1: Estimate \(\mathcal O_t\)
Step 2: Recover \(\hat A\) and \(\hat C\) from \(\hat{\mathcal O}\)
$$\hat{\mathcal O}_t = \begin{bmatrix}\hat{\mathcal O}_t [0]\\\hat{\mathcal O}_t [1]\\\vdots \\ \hat{\mathcal O}_t [t]\end{bmatrix}\approx \begin{bmatrix}C\\CA\\\vdots \\ CA^{t}\end{bmatrix} $$
So we set \(\hat C = \hat {\mathcal O}_t[0]\) and \(\hat A\) as solving
\(\hat{\mathcal O}_t = U_n\) from SVD of Hankel matrix \(\begin{bmatrix}y_0 & \dots &y_k\\\vdots & \ddots&\vdots\\ y_t&\dots& y_{t+k}\end{bmatrix}\)
Next time: cutting edge work on finite sample guarantees for system identification!
Reference: Callier & Desoer, "Linear Systems Theory" and Verhaegen & Verdult "Filtering and System Identification"
By Sarah Dean