Dynamical Systems
ML in Feedback Sys #5
Prof Sarah Dean
Reminders
-
Sign up to scribe and rank preferences for paper presentations by TODAY!
- SR19 "Near optimal finite time identification of arbitrary linear dynamical systems" (9/26)
- Required: meet with Atul at least 2 days before you are scheduled to present
- Working in pairs/groups, self-assessment

\(F\)
Dynamical System
\(s\)
\(F(s)\)
\(s_{t+1} = F(s_t)\)
An equilibrium point \(s_{eq}\) is
- stable if "for any desired accuracy, you can find a tolerance that guarantees it in perpetuity"
- unstable if it is not stable
- asymptotically stable if it is stable and "you can find a tolerance that guarantees converges to \(s_{eq}\)"
Stability
- \(s_{eq}\) is stable if for all \(\epsilon>0\), there exists a \(\delta=\delta(\epsilon)\) such that for all \(t>0\), $$ \|s_0-s_{eq}\|<\delta \implies \|s_t-s_{eq}\|<\epsilon $$
Stability
- \(s = (x,y)\) and for some function with \(f(0)=0\):
- \(x_{t+1} = f(y_t)\),
- \(y_{t+1} = y_t\)
- stable with \(\epsilon^2 = f(\delta)^2 + \delta^2\)
- \(s_{t+1} = \begin{bmatrix} 0 & 5\\ 0 & 1\end{bmatrix} s_t\) is stable with \(\delta=\frac{1}{5}\epsilon\)
Linear stability
Determined by eigenvalues of dynamics matrix \(A\)
\(\mathbb C\)
asymptotically stable
unstable
marginally (un)stable
Stability via linearization
Stability via linear approximation of nonlinear \(F\)
Stability via linearization
Stability via linear approximation of nonlinear \(F\)
example: discrete-time damped pendulum
\(\theta_{t+1} = \theta_t + h \omega_t\)
\(\omega_{t+1} =\omega_t + h\left(\frac{g}{\ell}\sin\theta_t-d\omega_t\right)\)
angle \(\theta\)
angular velocity \(\omega\)
gravity
length \(\ell\)
\(\approx (1-dh)\omega_t + h\frac{g}{\ell}(\sin \theta_{eq}+\cos\theta_{eq}(\theta-\theta_{eq})\)
\(\sin x\approx \sin x_0 + \cos x_0(x - x_0)\)
equilibria at \(\theta=k\pi\) for \(k\in\mathbb N\)
Stability via linearization
Stability via linear approximation of nonlinear \(F\)
example: discrete-time damped pendulum
angle \(\theta\)
angular velocity \(\omega\)
gravity
length \(\ell\)
$$\begin{bmatrix}\theta_{t+1}-\theta_{eq}\\ \omega_{t+1}-\omega_{eq}\end{bmatrix} \approx \begin{bmatrix} 1 & h\\ h \frac{g}{\ell}\cos(\theta_{eq})& 1-dh\end{bmatrix}\begin{bmatrix}\theta_{t}-\theta_{eq}\\ \omega_{t}-\omega_{eq}\end{bmatrix} $$
at \(\theta_{eq}=0\), real eigenvalues \(0<\lambda_2<1<\lambda_1\)
at \(\theta_{eq}=\pi\), complex eigenvalues with \(|\lambda|<1\) for small \(d\)
Exercise: work out the details of this analysis (simulation notebook)
\(\lambda = 1-h\frac{d}{2} \pm h\sqrt{(\frac{d}{2})^2+\frac{g}{\ell}\cos(\theta_{eq})}\)
Stability via linearization
Stability via linear approximation of nonlinear \(F\)
example: discrete-time damped pendulum
angle \(\theta\)
angular velocity \(\omega\)
gravity
length \(\ell\)
at \(\theta_{eq}=0\), real eigenvalues \(0<\lambda_2<1<\lambda_1\)
at \(\theta_{eq}=\pi\), complex eigenvalues with \(|\lambda|<1\) for small \(d\)
Linearization via Taylor Series:
\(s_{t+1} = F(s_t) \)
Stability via linearization
Stability via linear approximation of nonlinear \(F\)
The Jacobian \(J\) of \(G:\mathbb R^{n}\to\mathbb R^{m}\) is defined as $$ J(x) = \begin{bmatrix}\frac{\partial G_1}{\partial x_1} & \dots & \frac{\partial G_1}{\partial x_n} \\ \vdots & \ddots & \vdots \\ \frac{\partial G_m}{\partial x_1} &\dots & \frac{\partial G_m}{\partial x_n}\end{bmatrix}$$
\(F(s_{eq}) + J(s_{eq}) (s_t - s_{eq}) \) + higher order terms
\(s_{eq} + J(s_{eq}) (s_t - s_{eq}) \) + higher order terms
\(s_{t+1}-s_{eq} \approx J(s_{eq})(s_t-s_{eq})\)
Consider the dynamics of gradient descent on a twice differentiable function \(g:\mathbb R^d\to\mathbb R^d\)
\(\theta_{t+1} = \theta_t - \alpha\nabla g(\theta_t)\)
Jacobian \(J(\theta) = I - \alpha \nabla^2 g(\theta)\)
Example: gradient descent
- Let \(\{\gamma_i\}_{i=1}^d\) be the eigenvalues of the Hessian \(\nabla^2 g(\theta_{eq})\)
- Then the eigenvalues of the Jacobian are \(1-\alpha\gamma_i\)
-
if any \(\gamma_i\leq 0\), \(\theta_{eq}\) is not stable
-
i.e. saddle, local maximum, or degenerate critical point of \(g\)
-
- as long as \(\alpha<\frac{1}{\gamma_i}\) for all \(i\), \(\theta_{eq}\) is stable
-
Definition: A Lyapunov function \(V:\mathcal S\to \mathbb R\) for \(F\) is continuous and
- (positive definite) \(V(0)=0\) and \(V(0)>0\) for all \(s\in\mathcal S - \{0\}\)
- (decreasing) \(V(F(s)) - V(s) \leq 0\) for all \(s\in\mathcal S\)
- Optionally,
- (strict) \(V(F(s)) - V(s) < 0\) for all \(s\in\mathcal S-\{0\}\)
- (global) \(\|s\|_2\to \infty \implies V(s)\to\infty\)
Stability via Lyapunov
Reference: Bof, Carli, Schenato, "Lyapunov Theory for Discrete Time Systems"
Stability via Lyapunov
Theorem (1.2, 1.4): Suppose that \(F\) is locally Lipschitz, \(s_{eq}=0\) is a fixed point, and \(V\) is a Lyapunov function for \(F,s_{eq}\). Then, \(s_{eq}=0\) is
- stable
- asymptotically stable if \(V\) satisfies the strict property
- globally asymptotically stable if \(V\) satisfies the strict and global properties
Reference: Bof, Carli, Schenato, "Lyapunov Theory for Discrete Time Systems"
Quadratic Lyapunov functions
- Stable matrices have quadratic Lyapunov functions of the form \(V(s) = s^\top P s\) (Theorem 3.2)
- For example, \(P = \sum_{t=0}^\infty (A^\top)^t A^t\)
- Exercise: show that the above is a strict and global Lyapunov function for \(s_{t+1}=As_t\).
- When Jacobian \(J(0)\) is stable, can show that \(V(s)=s^\top P s\) is a strict Lyapunov function for \(s_{t+1} = F(s_t)\).
Quadratic Lyapunov functions
Theorem (3.3): Suppose \(F\) is locally Lipschitz, \(0\) is a fixed point, and let \(\{\lambda_i\}_{i=1}^n\subset \mathbb C\) be the eigenvalues of the Jacobian \(J(0)\). Then \(0\) is
- asymptotically stable if \(\max_{i\in[n]}|\lambda_i|<1\)
- unstable if \(\max_{i\in[n]}|\lambda_i|> 1\)

\(F\)
Dynamical System
\(s\)
\(F(s)\)
\(s_{t+1} = F(s_t)\)

\(F\)
Dynamical System
\(s\)
\(s_{t+1} = F(s_t, w_t)\)
\(y_t = G(s_t)\)
\(w_t\)
\(y_t\)
Inputs and outputs
- input signal \(w_t\) represents external phenomena
- action/control input
- disturbance/process noise
- output signal \(y_t\) represents what is measured



System response

\(F\)
\(s\)
\((w_0, w_1, ... )\)
\(\Phi\)
\((y_0, y_1, ...)\)
Linear system response
\(s_{t+1} = As_t+ w_t\)
\(y_t = Cs_t\)
$$y_{t} = CA^t s_0+ \sum_{k=1}^{t}CA^{k-1}w_{t-k}$$
Linear-Gaussian system
If \(s_0\sim\mathcal N(\mu, \Sigma)\) and \(w_t\sim \mathcal N(\mu,\Sigma)\), then
\(y_t\sim \mathcal N(\sum_{k=0}^{t}CA^{k}\mu, \sum_{k=0}^{t}CA^{k}\Sigma(A^{k})^\top C^\top)\)
The limit as \(t\to\infty\) is the steady state distribution
Linear stochastic system
If \(\mathbb E[s_0] = \mathbb E[w_t] =\mu\) and \(\mathrm{Cov}[s_0] = \mathrm{Cov}[w_t] = \Sigma\), then
\(\mathbb E[y_t] = \sum_{k=0}^{t}CA^{k}\mu\)
\(\mathrm{Cov}[y_t] = \sum_{k=0}^{t}CA^{k}\Sigma(A^{k})^\top C^\top\)
Can characterize moments without knowing whole distribution
Bounded inputs & bounded outputs
What if inputs are not stochastic?

The sequence \((w_0, w_1, ...)\) can be adversarially chosen, but is bounded
- at every \(t\): \(\|w_t\|\leq B\) for all \(t\)
- on average: \(\sqrt{\lim_{T\to\infty}\frac{1}{T}\sum_{t=0}^{T-1} \|w_t\|_2^2}\leq B\)
often possible to show that output are also bounded (upcoming lectures)
Input-to-state stability
For a system \(s_{t+1} = F(s_t) + w_t\) where \(F(0)=0\)
Suppose that there is a \(V:\mathcal S\to\mathbb R_+\) that
- defines a metric: \(\alpha_1 \|s\|_2^2 \leq V(s) \leq \alpha_2 \|s\|_2^2 \) and $$\sqrt{V(s+s')}\leq \sqrt{V(s)}+\sqrt{V(s')}$$
- is contracting: \(V(F(s)) \leq \gamma V(s)\) for some \(\gamma<1\) and all \(s\in\mathcal S\)
Then if \(\|w_t\|_2\leq B\) for all \(t\), \(\displaystyle \lim_{t\to\infty} \|s_t\|_2 \leq \sqrt{\frac{\alpha_2}{\alpha_1}}\frac{B}{1-\sqrt{\gamma}} \)
\(\sqrt{V(s_{t+1})} = \sqrt{V(F(s_t) + w_t)} \leq \sqrt{V(F(s_t))} + \sqrt{V(w_t)} \leq \sqrt{\gamma V(s_t)} + \sqrt{\alpha_2}\|w_t\|_2 \)
\(\implies \sqrt{V(s_{t})} \leq \gamma^{t/2}\sqrt{ V(s_0)} + \sum_{k=1}^t \gamma^{(k-1)/2}\sqrt{\alpha_2}\|w_{t-k}\|_2\)
\(\implies \|s_t\|_2 \leq \gamma^{t/2}\sqrt{ \frac{\alpha_2}{\alpha_1}}\|s_0\|_2 + \sum_{k=1}^t \gamma^{(k-1)/2}\sqrt{\frac{\alpha_2}{\alpha_1}}\|w_{t-k}\|_2\)
Proof
Consider the dynamics of stochastic gradient descent on a twice differentiable function \(g:\mathbb R^d\to\mathbb R^d\)
\(\theta_{t+1} = \theta_t - \alpha g_t\)
Example: stochastic gradient descent
\(= \theta_t - \alpha \nabla g(\theta_t) + \underbrace{\alpha (\nabla g(\theta_t) - g_t) }_{w_t} \)
- may converge to ball around minima
- in general, could jump between minima
- in practice, decreasing step size \(\alpha_t\propto 1/t\)
Next time: see these ideas in action!
Recap
References: Bof, Carli, Schenato, "Lyapunov Theory for Discrete Time Systems"
- Nonlinear stability
- Linearization
- Lyapunov functions
- Inputs and outputs
- System response
- Stability
05 - Dynamical Systems - ML in Feedback Sys
By Sarah Dean