Dynamical Systems

ML in Feedback Sys #4

Prof Sarah Dean

Reminders

  • Sign up to scribe and rank preferences for paper presentations by September 7th
    • SR19 "Near optimal finite time identification of arbitrary linear dynamical systems" (9/26)
  • Required: meet with Atul at least 2 days before you are scheduled to present
  • Working in pairs/groups, self-assessment

training data

\(\{(x_i, y_i)\}\)

model

\(f:\mathcal X\to\mathcal Y\)

policy

 

 

observation

action

ML in Feedback Systems

model

\(f_t:\mathcal X\to\mathcal Y\)

observation

prediction

Online learning

\(x_t\)

Goal: cumulatively over time, predictions \(\hat y_t = f_t(x_t)\) are close to true \(y_t\)

accumulate

\(\{(x_t, y_t)\}\)

$$\theta_t = \underbrace{\Big(\sum_{k=1}^{t-1}x_k x_k^\top  + \lambda I\Big)^{-1}}_{A_{t-1}^{-1}}\underbrace{\sum_{k=1}^{t-1}x_ky_k }_{b_{t-1}}$$

Follow the (Regularized) Leader

$$\theta_t = \arg\min \sum_{k=1}^{t-1} (\theta^\top x_k-y_k)^2 +  \lambda\|\theta\|_2^2$$

Online Gradient Descent

$$\theta_t = \theta_{t-1} - \alpha (\theta_{t-1}^\top x_{t-1}-y_{t-1})x_{t-1}$$

Case study: least-squares

Sherman-Morrison formula: \(\displaystyle (A+uv^\top)^{-1} = A^{-1} - \frac{A^{-1}uv^\top A^{-1}}{1+v^\top A^{-1}u} \)

Follow the (Regularized) Leader

$$\theta_t = \arg\min \sum_{k=1}^{t-1} (\theta^\top x_k-y_k)^2 +  \lambda\|\theta\|_2^2$$

Online Gradient Descent

$$\theta_t = \theta_{t-1} - \alpha (\theta_{t-1}^\top x_{t-1}-y_{t-1})x_{t-1}$$

Recursive least-squares

Recursive FTRL

  • set \(M_0=\frac{1}{\lambda}I\) and \(b_0 = 0\)
  • for \(t=1,2,...\)
    • \(\theta_t = M_{t-1}b_{t-1}\)
    • \(M_t = M_{t-1} - \frac{M_{t-1}x_t x_t^\top M_{t-1}}{1-x_t^\top M_{t-1}x_t}\)
    • \(b_t = b_{t-1}+x_ty_t\)

Today: dynamical world

A world that evolves over time

Difference equation and state space

$$ s_{t+1} = F(s_t)$$

(Autonomous) discrete-time dynamical system where \(F:\mathcal S\to\mathcal S\)

\(\mathcal S\) is the state space. The state is sufficient for predicting its future.

Given initial state \(s_0\), the solutions to difference equations, i.e. trajectories: $$ (s_0, F(s_0), F(F(s_0)), ... ) $$

What might trajectories look like?

  • converging \((1, 0.1, 0.001, 0.0001, ...)\)
  • diverging \((1, 10, 100, 1000,...)\)
  • oscillating \((1, -1, 1, -1, 1, ...)\)
  • converging towards oscillation \((0.9, -0.99, 0.999, -0.9999,...)\)

Trajectories

An equilibrium point \(s_\mathrm{eq}\) satisfies

\(s_{eq} = F(s_{eq})\)

Equilibria or Fixed Points

An equilibrium point \(s_{eq}\) is

  • stable if for all \(\epsilon>0\), there exists a \(\delta=\delta(\epsilon)\) such that for all \(t>0\), $$ \|s_0-s_{eq}\|<\delta \implies \|s_t-s_{eq}\|<\epsilon $$
  • unstable if it is not stable
  • asymptotically stable if it is stable and \(\delta\) can be chosen such that $$ \|s_0-s_{eq}\|<\delta\implies \lim_{t\to\infty} s_t = s_{eq} $$

examples:

  • \(s_{t+1} = s_t\)
  • \(s_{t+1} = 2s_t\)
  • \(s_{t+1} = 0.5 s_t\)

Stability

Suppose that \(s_0=v\) is an eigenvector of \(A\)

$$ s_{t+1} = As_t$$

$$ s_{t} =\lambda^t v$$

Linear dynamics

Consider \(\mathcal S = \mathbb R^n\) and linear dynamics

Suppose that \(s_0=v\) is an eigenvector of \(A\)

$$ s_{t+1} = As_t$$

$$ s_{t} =\lambda^t v$$

Linear dynamics

Consider \(\mathcal S = \mathbb R^n\) and linear dynamics

\(\lambda>1\)

Linear dynamics

Consider \(\mathcal S = \mathbb R^n\) and linear dynamics

If similar to a real diagonal matrix: \(A=VDV^{-1} = \begin{bmatrix} |&&|\\v_1&\dots& v_n\\|&&|\end{bmatrix} \begin{bmatrix} \lambda_1&&\\&\ddots&\\&&\lambda_n\end{bmatrix}  \begin{bmatrix} -&u_1^\top &-\\&\vdots&\\-&u_n^\top&-\end{bmatrix} \)

\(\displaystyle s_t = \sum_{i=1}^n v_i \lambda_i^t (u_i^\top s_0)\) is a weighted combination of (right) eigenvectors

$$ s_{t+1} = As_t$$

Linear trajectories in \(n=2\)

General case: real eigenvalues with geometric multiplicity equal to algebraic multiplicity

Example 1:  \(\displaystyle s_{t+1} = \begin{bmatrix} \lambda_1 & \\ & \lambda_2 \end{bmatrix} s_t \)

\(0<\lambda_2<\lambda_1<1\)

\(0<\lambda_2<1<\lambda_1\)

\(1<\lambda_2<\lambda_1\)

Exercise: what do trajectories look like when \(\lambda_1\) and/or \(\lambda_2\) is negative? (demo notebook)

Linear trajectories in \(n=2\)

Example 2:  \(\displaystyle s_{t+1} = \begin{bmatrix} \alpha & -\beta\\\beta  & \alpha\end{bmatrix} s_t  \)

\(0<\alpha^2+\beta^2<1\)

\(1<\alpha^2+\beta^2\)

Exercise: what do trajectories look like when \(\alpha\) is negative? (demo notebook)

General case: pair of complex eigenvalues

\(\lambda = \alpha \pm i \beta\)

$$\begin{bmatrix}1\\0\end{bmatrix} \to \begin{bmatrix}\alpha\\ \beta\end{bmatrix} $$

rotation by \(\arctan(\beta/\alpha)\)

scale by \(\sqrt{\alpha^2+\beta^2}\)

Linear trajectories in \(n=2\)

Example 3:  \(\displaystyle s_{t+1} = \begin{bmatrix} \lambda & 1\\  & \lambda\end{bmatrix} s_t  \)

\(0<\lambda<1\)

\(1<\lambda\)

Exercise: what do trajectories look like when \(\lambda\) is negative? (demo notebook)

General case: eigenvalues with geometric multiplicity \(>1\)

$$ \left(\begin{bmatrix} \lambda & \\  & \lambda\end{bmatrix} + \begin{bmatrix}  & 1\\  & \end{bmatrix} \right)^t$$

$$ =\begin{bmatrix} \lambda^t & t\lambda^{t-1}\\  & \lambda^t\end{bmatrix} $$

All matrices are similar to a matrix of Jordan canonical form

where \(J_i = \begin{bmatrix}\lambda_i & 1 & &\\ & \ddots & \ddots &\\ &&\ddots &1\\ && &\lambda_i \end{bmatrix}\in\mathbb R^{m_i\times m_i}\)

Reference: Ch 3d and 4 in Callier & Desoer, "Linear Systems Theory"

Linear trajectories in general

\(\begin{bmatrix} J_1&&\\&\ddots&\\&&J_p\end{bmatrix} \)

\(m_i\) is geometric multiplicity of \(\lambda_i\)

Linear stability

Theorem: Let \(\{\lambda_i\}_{i=1}^n\subset \mathbb C\) be the eigenvalues of \(A\).
Then for \(s_{t+1}=As_t\), the equilibrium \(s_{eq}=0\) is

  • asymptotically (exponentially, globally) stable \(\iff \max_{i\in[n]}|\lambda_i|<1\)
  • unstable if \(\max_{i\in[n]}|\lambda_i|> 1\)
  • call \(\max_{i\in[n]}|\lambda_i|=1\) "marginally (un)stable"

\(\mathbb C\)

Linearization via Taylor Series:

\(s_{t+1} = F(s_t) \)

Stability via linearization

Stability via linear approximation of nonlinear \(F\)

The Jacobian \(J\) of \(G:\mathbb R^{n}\to\mathbb R^{m}\) is defined as $$ J(x) = \begin{bmatrix}\frac{\partial G_1}{\partial x_1} & \dots & \frac{\partial G_1}{\partial x_n} \\ \vdots & \ddots & \vdots \\ \frac{\partial G_m}{\partial x_1} &\dots & \frac{\partial G_m}{\partial x_n}\end{bmatrix}$$

\(F(s_{eq}) + J(s_{eq})  (s_t - s_{eq}) \) + higher order terms

\(s_{eq} + J(s_{eq})  (s_t - s_{eq}) \) + higher order terms

\(s_{t+1}-s_{eq} \approx J(s_{eq})(s_t-s_{eq})\)

Consider the dynamics of gradient descent on a twice differentiable function \(g:\mathbb R^d\to\mathbb R^d\)

\(\theta_{t+1} = \theta_t - \alpha\nabla g(\theta_t)\)

Jacobian \(J(\theta) = I - \alpha \nabla^2 g(\theta)\)

Example: gradient descent

  • Let \(\{\gamma_i\}_{i=1}^d\) be the eigenvalues of the Hessian \(\nabla^2 g(\theta_{eq})\)
  • Then the eigenvalues of the Jacobian are \(1-\alpha\gamma_i\)
    • if any \(\gamma_i\leq 0\), \(\theta_{eq}\) is not stable

      • i.e. saddle, local maximum, or degenerate critical point of \(g\)

    • as long as \(\alpha<\frac{1}{\gamma_i}\) for all \(i\), \(\theta_{eq}\) is stable

Definition: A Lyapunov function \(V:\mathcal S\to \mathbb R\) for \(F,s_{eq}\) is continuous and

  • (positive definite) \(V(s_{eq})=0\) and \(V(s_{eq})>0\) for all \(s\in\mathcal S - \{s_{eq}\}\)
  • (decreasing) \(V(F(s)) - V(s) \leq 0\) for all \(s\in\mathcal S\)
  • Optionally,
    • (strict) \(V(F(s)) - V(s) < 0\) for all \(s\in\mathcal S-\{s_{eq}\}\)
    • (global) \(\|s-s_{eq}\|_2\to \infty \implies V(s)\to\infty\)

Stability via Lyapunov

Reference: Bof, Carli, Schenato, "Lyapunov Theory for Discrete Time Systems"

Stability via Lyapunov

Theorem (1.2, 1.4): Suppose that \(F\) is locally Lipschitz, \(s_{eq}\) is a fixed point, and \(V\) is a Lyapunov function for \(F,s_{eq}\). Then, \(s_{eq}\)  is

  • stable
  • asymptotically stable if \(V\) satisfies the strict property
  • globally asymptotically stable if \(V\) satisfies the strict and global properties

Reference: Bof, Carli, Schenato, "Lyapunov Theory for Discrete Time Systems"

Quadratic Lyapunov functions

  • Stable matrices have quadratic Lyapunov functions of the form \(V(s) = s^\top P s\) (Theorem 3.2)
    • For example, \(P = \sum_{t=0}^\infty (A^\top)^t A^t\)
  • Exercise: show that the above is a strict and global Lyapunov function for \(s_{t+1}=As_t\).
  • When Jacobian \(J(s_{eq})\) is stable, can show that \(V(s)=(s-s_{eq})^\top P (s-s_{eq})\) is a strict Lyapunov function for \(s_{t+1} = F(s_t)\).

Theorem (3.3): Suppose \(F\) is locally Lipschitz, \(s_{eq}\) is a fixed point, and let \(\{\lambda_i\}_{i=1}^n\subset \mathbb C\) be the eigenvalues of the Jacobian \(J(s_{eq})\). Then \(s_{eq}\) is

  • asymptotically stable if \(\max_{i\in[n]}|\lambda_i|<1\)
  • unstable if \(\max_{i\in[n]}|\lambda_i|> 1\)

Next time: actions, disturbances, measurement

Recap

References: Bof, Carli, Schenato, "Lyapunov Theory for Discrete Time Systems"; Callier & Desoer, "Linear Systems Theory"

  • Recursive least squares
  • Dynamical systems definitions
    • difference equations, equilibria, stability
  • Linear systems
    • eigendecomposition for trajectories & stability
  • Nonlinear stability
    • Lyapunov functions & linearization