Dynamical Systems

ML in Feedback Sys #5

Prof Sarah Dean

Reminders

  • Sign up to scribe and rank preferences for paper presentations by TODAY!
    • SR19 "Near optimal finite time identification of arbitrary linear dynamical systems" (9/26)
  • Required: meet with Atul at least 2 days before you are scheduled to present
  • Working in pairs/groups, self-assessment

\(F\)

Dynamical System

\(s\)

\(F(s)\)

\(s_{t+1} = F(s_t)\)

An equilibrium point \(s_{eq}\) is

  • stable if "for any desired accuracy, you can find a tolerance that guarantees it in perpetuity"
  • unstable if it is not stable
  • asymptotically stable if it is stable and "you can find a tolerance that guarantees converges to \(s_{eq}\)"

Stability

  • \(s_{eq}\) is stable if for all \(\epsilon>0\), there exists a \(\delta=\delta(\epsilon)\) such that for all \(t>0\), $$ \|s_0-s_{eq}\|<\delta \implies \|s_t-s_{eq}\|<\epsilon $$

Stability

  • \(s = (x,y)\) and for some function with \(f(0)=0\):
    • \(x_{t+1} = f(y_t)\),
    • \(y_{t+1} = y_t\)
  • stable with \(\epsilon^2 = f(\delta)^2 + \delta^2\)
  • \(s_{t+1} = \begin{bmatrix} 0 & 5\\ 0 & 1\end{bmatrix} s_t\) is stable with \(\delta=\frac{1}{5}\epsilon\)

Linear stability

Determined by eigenvalues of dynamics matrix \(A\)

\(\mathbb C\)

asymptotically stable

unstable

marginally (un)stable

Stability via linearization

Stability via linear approximation of nonlinear \(F\)

Stability via linearization

Stability via linear approximation of nonlinear \(F\)

example: discrete-time damped pendulum

\(\theta_{t+1} = \theta_t + h \omega_t\)

\(\omega_{t+1} =\omega_t + h\left(\frac{g}{\ell}\sin\theta_t-d\omega_t\right)\)

angle \(\theta\)

angular velocity \(\omega\)

gravity

length \(\ell\)

 

\(\approx (1-dh)\omega_t + h\frac{g}{\ell}(\sin \theta_{eq}+\cos\theta_{eq}(\theta-\theta_{eq})\)

\(\sin x\approx \sin x_0 + \cos x_0(x - x_0)\)

equilibria at \(\theta=k\pi\) for \(k\in\mathbb N\)

Stability via linearization

Stability via linear approximation of nonlinear \(F\)

example: discrete-time damped pendulum

angle \(\theta\)

angular velocity \(\omega\)

gravity

length \(\ell\)

$$\begin{bmatrix}\theta_{t+1}-\theta_{eq}\\ \omega_{t+1}-\omega_{eq}\end{bmatrix} \approx \begin{bmatrix} 1 & h\\ h \frac{g}{\ell}\cos(\theta_{eq})& 1-dh\end{bmatrix}\begin{bmatrix}\theta_{t}-\theta_{eq}\\ \omega_{t}-\omega_{eq}\end{bmatrix} $$

at \(\theta_{eq}=0\), real eigenvalues \(0<\lambda_2<1<\lambda_1\)

at \(\theta_{eq}=\pi\), complex eigenvalues with \(|\lambda|<1\) for small \(d\)

Exercise: work out the details of this analysis (simulation notebook)

\(\lambda = 1-h\frac{d}{2} \pm h\sqrt{(\frac{d}{2})^2+\frac{g}{\ell}\cos(\theta_{eq})}\)

Stability via linearization

Stability via linear approximation of nonlinear \(F\)

example: discrete-time damped pendulum

angle \(\theta\)

angular velocity \(\omega\)

gravity

length \(\ell\)

at \(\theta_{eq}=0\), real eigenvalues \(0<\lambda_2<1<\lambda_1\)

at \(\theta_{eq}=\pi\), complex eigenvalues with \(|\lambda|<1\) for small \(d\)

Linearization via Taylor Series:

\(s_{t+1} = F(s_t) \)

Stability via linearization

Stability via linear approximation of nonlinear \(F\)

The Jacobian \(J\) of \(G:\mathbb R^{n}\to\mathbb R^{m}\) is defined as $$ J(x) = \begin{bmatrix}\frac{\partial G_1}{\partial x_1} & \dots & \frac{\partial G_1}{\partial x_n} \\ \vdots & \ddots & \vdots \\ \frac{\partial G_m}{\partial x_1} &\dots & \frac{\partial G_m}{\partial x_n}\end{bmatrix}$$

\(F(s_{eq}) + J(s_{eq})  (s_t - s_{eq}) \) + higher order terms

\(s_{eq} + J(s_{eq})  (s_t - s_{eq}) \) + higher order terms

\(s_{t+1}-s_{eq} \approx J(s_{eq})(s_t-s_{eq})\)

Consider the dynamics of gradient descent on a twice differentiable function \(g:\mathbb R^d\to\mathbb R^d\)

\(\theta_{t+1} = \theta_t - \alpha\nabla g(\theta_t)\)

Jacobian \(J(\theta) = I - \alpha \nabla^2 g(\theta)\)

Example: gradient descent

  • Let \(\{\gamma_i\}_{i=1}^d\) be the eigenvalues of the Hessian \(\nabla^2 g(\theta_{eq})\)
  • Then the eigenvalues of the Jacobian are \(1-\alpha\gamma_i\)
    • if any \(\gamma_i\leq 0\), \(\theta_{eq}\) is not stable

      • i.e. saddle, local maximum, or degenerate critical point of \(g\)

    • as long as \(\alpha<\frac{1}{\gamma_i}\) for all \(i\), \(\theta_{eq}\) is stable

Definition: A Lyapunov function \(V:\mathcal S\to \mathbb R\) for \(F\) is continuous and

  • (positive definite) \(V(0)=0\) and \(V(0)>0\) for all \(s\in\mathcal S - \{0\}\)
  • (decreasing) \(V(F(s)) - V(s) \leq 0\) for all \(s\in\mathcal S\)
  • Optionally,
    • (strict) \(V(F(s)) - V(s) < 0\) for all \(s\in\mathcal S-\{0\}\)
    • (global) \(\|s\|_2\to \infty \implies V(s)\to\infty\)

Stability via Lyapunov

Reference: Bof, Carli, Schenato, "Lyapunov Theory for Discrete Time Systems"

Stability via Lyapunov

Theorem (1.2, 1.4): Suppose that \(F\) is locally Lipschitz, \(s_{eq}=0\) is a fixed point, and \(V\) is a Lyapunov function for \(F,s_{eq}\). Then, \(s_{eq}=0\)  is

  • stable
  • asymptotically stable if \(V\) satisfies the strict property
  • globally asymptotically stable if \(V\) satisfies the strict and global properties

Reference: Bof, Carli, Schenato, "Lyapunov Theory for Discrete Time Systems"

Quadratic Lyapunov functions

  • Stable matrices have quadratic Lyapunov functions of the form \(V(s) = s^\top P s\) (Theorem 3.2)
    • For example, \(P = \sum_{t=0}^\infty (A^\top)^t A^t\)
  • Exercise: show that the above is a strict and global Lyapunov function for \(s_{t+1}=As_t\).
  • When Jacobian \(J(0)\) is stable, can show that \(V(s)=s^\top P s\) is a strict Lyapunov function for \(s_{t+1} = F(s_t)\).

Quadratic Lyapunov functions

Theorem (3.3): Suppose \(F\) is locally Lipschitz, \(0\) is a fixed point, and let \(\{\lambda_i\}_{i=1}^n\subset \mathbb C\) be the eigenvalues of the Jacobian \(J(0)\). Then \(0\) is

  • asymptotically stable if \(\max_{i\in[n]}|\lambda_i|<1\)
  • unstable if \(\max_{i\in[n]}|\lambda_i|> 1\)

\(F\)

Dynamical System

\(s\)

\(F(s)\)

\(s_{t+1} = F(s_t)\)

\(F\)

Dynamical System

\(s\)

\(s_{t+1} = F(s_t, w_t)\)

\(y_t = G(s_t)\)

\(w_t\)

\(y_t\)

Inputs and outputs

  • input signal \(w_t\) represents external phenomena
    • action/control input
    • disturbance/process noise
  • output signal \(y_t\) represents what is measured

System response

\(F\)

\(s\)

\((w_0, w_1, ... )\)

\(\Phi\)

\((y_0, y_1, ...)\)

Linear system response

\(s_{t+1} = As_t+ w_t\)

\(y_t = Cs_t\)

$$y_{t} = CA^t s_0+ \sum_{k=1}^{t}CA^{k-1}w_{t-k}$$

Linear-Gaussian system

If \(s_0\sim\mathcal N(\mu, \Sigma)\) and \(w_t\sim \mathcal N(\mu,\Sigma)\), then

\(y_t\sim \mathcal N(\sum_{k=0}^{t}CA^{k}\mu, \sum_{k=0}^{t}CA^{k}\Sigma(A^{k})^\top C^\top)\)

The limit as \(t\to\infty\) is the steady state distribution

Linear stochastic system

If \(\mathbb E[s_0] = \mathbb E[w_t] =\mu\) and \(\mathrm{Cov}[s_0] = \mathrm{Cov}[w_t] = \Sigma\), then

\(\mathbb E[y_t] = \sum_{k=0}^{t}CA^{k}\mu\)

\(\mathrm{Cov}[y_t] = \sum_{k=0}^{t}CA^{k}\Sigma(A^{k})^\top C^\top\)

Can characterize moments without knowing whole distribution

Bounded inputs & bounded outputs

What if inputs are not stochastic?

The sequence \((w_0, w_1, ...)\) can be adversarially chosen, but is bounded

  • at every \(t\): \(\|w_t\|\leq B\) for all \(t\)
  • on average: \(\sqrt{\lim_{T\to\infty}\frac{1}{T}\sum_{t=0}^{T-1} \|w_t\|_2^2}\leq B\)

often possible to show that output are also bounded (upcoming lectures)

Input-to-state stability

For a system \(s_{t+1} = F(s_t) + w_t\) where \(F(0)=0\)

Suppose that there is a \(V:\mathcal S\to\mathbb R_+\) that

  • defines a metric: \(\alpha_1 \|s\|_2^2 \leq V(s) \leq \alpha_2 \|s\|_2^2 \) and $$\sqrt{V(s+s')}\leq \sqrt{V(s)}+\sqrt{V(s')}$$
  • is contracting: \(V(F(s)) \leq \gamma V(s)\) for some \(\gamma<1\) and all \(s\in\mathcal S\)

Then if \(\|w_t\|_2\leq B\) for all \(t\), \(\displaystyle \lim_{t\to\infty} \|s_t\|_2 \leq  \sqrt{\frac{\alpha_2}{\alpha_1}}\frac{B}{1-\sqrt{\gamma}} \)

\(\sqrt{V(s_{t+1})} = \sqrt{V(F(s_t) + w_t)} \leq \sqrt{V(F(s_t))} + \sqrt{V(w_t)} \leq \sqrt{\gamma V(s_t)}  + \sqrt{\alpha_2}\|w_t\|_2 \)

\(\implies \sqrt{V(s_{t})} \leq \gamma^{t/2}\sqrt{ V(s_0)} + \sum_{k=1}^t \gamma^{(k-1)/2}\sqrt{\alpha_2}\|w_{t-k}\|_2\)

\(\implies \|s_t\|_2 \leq \gamma^{t/2}\sqrt{ \frac{\alpha_2}{\alpha_1}}\|s_0\|_2 + \sum_{k=1}^t \gamma^{(k-1)/2}\sqrt{\frac{\alpha_2}{\alpha_1}}\|w_{t-k}\|_2\)

Proof

Consider the dynamics of stochastic gradient descent on a twice differentiable function \(g:\mathbb R^d\to\mathbb R^d\)

\(\theta_{t+1} = \theta_t - \alpha g_t\)

Example: stochastic gradient descent

\(= \theta_t -  \alpha \nabla g(\theta_t) + \underbrace{\alpha (\nabla g(\theta_t) - g_t) }_{w_t} \)

  • may converge to ball around minima
  • in general, could jump between minima
  • in practice, decreasing step size \(\alpha_t\propto 1/t\)

Next time: see these ideas in action!

Recap

References: Bof, Carli, Schenato, "Lyapunov Theory for Discrete Time Systems"

  • Nonlinear stability
    • Linearization
    • Lyapunov functions
  • Inputs and outputs
    • System response
    • Stability

05 - Dynamical Systems - ML in Feedback Sys

By Sarah Dean

Private

05 - Dynamical Systems - ML in Feedback Sys