Prof Sarah Dean
\(s\)
\(F(s)\)
\(s_{t+1} = F(s_t)\)
An equilibrium point \(s_{eq}\) is
Determined by eigenvalues of dynamics matrix \(A\)
\(\mathbb C\)
asymptotically stable
unstable
marginally (un)stable
Stability via linear approximation of nonlinear \(F\)
Stability via linear approximation of nonlinear \(F\)
example: discrete-time damped pendulum
\(\theta_{t+1} = \theta_t + h \omega_t\)
\(\omega_{t+1} =\omega_t + h\left(\frac{g}{\ell}\sin\theta_t-d\omega_t\right)\)
angle \(\theta\)
angular velocity \(\omega\)
gravity
length \(\ell\)
\(\approx (1-dh)\omega_t + h\frac{g}{\ell}(\sin \theta_{eq}+\cos\theta_{eq}(\theta-\theta_{eq})\)
\(\sin x\approx \sin x_0 + \cos x_0(x - x_0)\)
equilibria at \(\theta=k\pi\) for \(k\in\mathbb N\)
Stability via linear approximation of nonlinear \(F\)
example: discrete-time damped pendulum
angle \(\theta\)
angular velocity \(\omega\)
gravity
length \(\ell\)
$$\begin{bmatrix}\theta_{t+1}-\theta_{eq}\\ \omega_{t+1}-\omega_{eq}\end{bmatrix} \approx \begin{bmatrix} 1 & h\\ h \frac{g}{\ell}\cos(\theta_{eq})& 1-dh\end{bmatrix}\begin{bmatrix}\theta_{t}-\theta_{eq}\\ \omega_{t}-\omega_{eq}\end{bmatrix} $$
at \(\theta_{eq}=0\), real eigenvalues \(0<\lambda_2<1<\lambda_1\)
at \(\theta_{eq}=\pi\), complex eigenvalues with \(|\lambda|<1\) for small \(d\)
Exercise: work out the details of this analysis (simulation notebook)
\(\lambda = 1-h\frac{d}{2} \pm h\sqrt{(\frac{d}{2})^2+\frac{g}{\ell}\cos(\theta_{eq})}\)
Stability via linear approximation of nonlinear \(F\)
example: discrete-time damped pendulum
angle \(\theta\)
angular velocity \(\omega\)
gravity
length \(\ell\)
at \(\theta_{eq}=0\), real eigenvalues \(0<\lambda_2<1<\lambda_1\)
at \(\theta_{eq}=\pi\), complex eigenvalues with \(|\lambda|<1\) for small \(d\)
Linearization via Taylor Series:
\(s_{t+1} = F(s_t) \)
Stability via linear approximation of nonlinear \(F\)
The Jacobian \(J\) of \(G:\mathbb R^{n}\to\mathbb R^{m}\) is defined as $$ J(x) = \begin{bmatrix}\frac{\partial G_1}{\partial x_1} & \dots & \frac{\partial G_1}{\partial x_n} \\ \vdots & \ddots & \vdots \\ \frac{\partial G_m}{\partial x_1} &\dots & \frac{\partial G_m}{\partial x_n}\end{bmatrix}$$
\(F(s_{eq}) + J(s_{eq}) (s_t - s_{eq}) \) + higher order terms
\(s_{eq} + J(s_{eq}) (s_t - s_{eq}) \) + higher order terms
\(s_{t+1}-s_{eq} \approx J(s_{eq})(s_t-s_{eq})\)
Consider the dynamics of gradient descent on a twice differentiable function \(g:\mathbb R^d\to\mathbb R^d\)
\(\theta_{t+1} = \theta_t - \alpha\nabla g(\theta_t)\)
Jacobian \(J(\theta) = I - \alpha \nabla^2 g(\theta)\)
if any \(\gamma_i\leq 0\), \(\theta_{eq}\) is not stable
i.e. saddle, local maximum, or degenerate critical point of \(g\)
Definition: A Lyapunov function \(V:\mathcal S\to \mathbb R\) for \(F\) is continuous and
Reference: Bof, Carli, Schenato, "Lyapunov Theory for Discrete Time Systems"
Theorem (1.2, 1.4): Suppose that \(F\) is locally Lipschitz, \(s_{eq}=0\) is a fixed point, and \(V\) is a Lyapunov function for \(F,s_{eq}\). Then, \(s_{eq}=0\) is
Reference: Bof, Carli, Schenato, "Lyapunov Theory for Discrete Time Systems"
Theorem (3.3): Suppose \(F\) is locally Lipschitz, \(0\) is a fixed point, and let \(\{\lambda_i\}_{i=1}^n\subset \mathbb C\) be the eigenvalues of the Jacobian \(J(0)\). Then \(0\) is
\(s\)
\(F(s)\)
\(s_{t+1} = F(s_t)\)
\(s\)
\(s_{t+1} = F(s_t, w_t)\)
\(y_t = G(s_t)\)
\(w_t\)
\(y_t\)
\(s\)
\((w_0, w_1, ... )\)
\((y_0, y_1, ...)\)
\(s_{t+1} = As_t+ w_t\)
\(y_t = Cs_t\)
$$y_{t} = CA^t s_0+ \sum_{k=1}^{t}CA^{k-1}w_{t-k}$$
If \(s_0\sim\mathcal N(\mu, \Sigma)\) and \(w_t\sim \mathcal N(\mu,\Sigma)\), then
\(y_t\sim \mathcal N(\sum_{k=0}^{t}CA^{k}\mu, \sum_{k=0}^{t}CA^{k}\Sigma(A^{k})^\top C^\top)\)
The limit as \(t\to\infty\) is the steady state distribution
If \(\mathbb E[s_0] = \mathbb E[w_t] =\mu\) and \(\mathrm{Cov}[s_0] = \mathrm{Cov}[w_t] = \Sigma\), then
\(\mathbb E[y_t] = \sum_{k=0}^{t}CA^{k}\mu\)
\(\mathrm{Cov}[y_t] = \sum_{k=0}^{t}CA^{k}\Sigma(A^{k})^\top C^\top\)
Can characterize moments without knowing whole distribution
What if inputs are not stochastic?
The sequence \((w_0, w_1, ...)\) can be adversarially chosen, but is bounded
often possible to show that output are also bounded (upcoming lectures)
For a system \(s_{t+1} = F(s_t) + w_t\) where \(F(0)=0\)
Suppose that there is a \(V:\mathcal S\to\mathbb R_+\) that
Then if \(\|w_t\|_2\leq B\) for all \(t\), \(\displaystyle \lim_{t\to\infty} \|s_t\|_2 \leq \sqrt{\frac{\alpha_2}{\alpha_1}}\frac{B}{1-\sqrt{\gamma}} \)
\(\sqrt{V(s_{t+1})} = \sqrt{V(F(s_t) + w_t)} \leq \sqrt{V(F(s_t))} + \sqrt{V(w_t)} \leq \sqrt{\gamma V(s_t)} + \sqrt{\alpha_2}\|w_t\|_2 \)
\(\implies \sqrt{V(s_{t})} \leq \gamma^{t/2}\sqrt{ V(s_0)} + \sum_{k=1}^t \gamma^{(k-1)/2}\sqrt{\alpha_2}\|w_{t-k}\|_2\)
\(\implies \|s_t\|_2 \leq \gamma^{t/2}\sqrt{ \frac{\alpha_2}{\alpha_1}}\|s_0\|_2 + \sum_{k=1}^t \gamma^{(k-1)/2}\sqrt{\frac{\alpha_2}{\alpha_1}}\|w_{t-k}\|_2\)
Consider the dynamics of stochastic gradient descent on a twice differentiable function \(g:\mathbb R^d\to\mathbb R^d\)
\(\theta_{t+1} = \theta_t - \alpha g_t\)
\(= \theta_t - \alpha \nabla g(\theta_t) + \underbrace{\alpha (\nabla g(\theta_t) - g_t) }_{w_t} \)
Next time: see these ideas in action!
References: Bof, Carli, Schenato, "Lyapunov Theory for Discrete Time Systems"