Prof Sarah Dean

## Reminders

• Sign up to scribe and rank preferences for paper presentations by September 7th
• SR19 "Near optimal finite time identification of arbitrary linear dynamical systems" (9/26)
• Required: meet with Atul at least 2 days before you are scheduled to present
• Working in pairs/groups, self-assessment

training data

$$\{(x_i, y_i)\}$$

model

$$f:\mathcal X\to\mathcal Y$$

policy

observation

action

## ML in Feedback Systems

model

$$f_t:\mathcal X\to\mathcal Y$$

observation

prediction

## Online learning

$$x_t$$

Goal: cumulatively over time, predictions $$\hat y_t = f_t(x_t)$$ are close to true $$y_t$$

accumulate

$$\{(x_t, y_t)\}$$

$$\theta_t = \underbrace{\Big(\sum_{k=1}^{t-1}x_k x_k^\top + \lambda I\Big)^{-1}}_{A_{t-1}^{-1}}\underbrace{\sum_{k=1}^{t-1}x_ky_k }_{b_{t-1}}$$

$$\theta_t = \arg\min \sum_{k=1}^{t-1} (\theta^\top x_k-y_k)^2 + \lambda\|\theta\|_2^2$$

$$\theta_t = \theta_{t-1} - \alpha (\theta_{t-1}^\top x_{t-1}-y_{t-1})x_{t-1}$$

## Case study: least-squares

Sherman-Morrison formula: $$\displaystyle (A+uv^\top)^{-1} = A^{-1} - \frac{A^{-1}uv^\top A^{-1}}{1+v^\top A^{-1}u}$$

$$\theta_t = \arg\min \sum_{k=1}^{t-1} (\theta^\top x_k-y_k)^2 + \lambda\|\theta\|_2^2$$

$$\theta_t = \theta_{t-1} - \alpha (\theta_{t-1}^\top x_{t-1}-y_{t-1})x_{t-1}$$

## Recursive least-squares

Recursive FTRL

• set $$M_0=\frac{1}{\lambda}I$$ and $$b_0 = 0$$
• for $$t=1,2,...$$
• $$\theta_t = M_{t-1}b_{t-1}$$
• $$M_t = M_{t-1} - \frac{M_{t-1}x_t x_t^\top M_{t-1}}{1-x_t^\top M_{t-1}x_t}$$
• $$b_t = b_{t-1}+x_ty_t$$

## Today: dynamical world

A world that evolves over time

## Difference equation and state space

$$s_{t+1} = F(s_t)$$

(Autonomous) discrete-time dynamical system where $$F:\mathcal S\to\mathcal S$$

$$\mathcal S$$ is the state space. The state is sufficient for predicting its future.

Given initial state $$s_0$$, the solutions to difference equations, i.e. trajectories: $$(s_0, F(s_0), F(F(s_0)), ... )$$

What might trajectories look like?

• converging $$(1, 0.1, 0.001, 0.0001, ...)$$
• diverging $$(1, 10, 100, 1000,...)$$
• oscillating $$(1, -1, 1, -1, 1, ...)$$
• converging towards oscillation $$(0.9, -0.99, 0.999, -0.9999,...)$$

## Trajectories

An equilibrium point $$s_\mathrm{eq}$$ satisfies

$$s_{eq} = F(s_{eq})$$

## Equilibria or Fixed Points

An equilibrium point $$s_{eq}$$ is

• stable if for all $$\epsilon>0$$, there exists a $$\delta=\delta(\epsilon)$$ such that for all $$t>0$$, $$\|s_0-s_{eq}\|<\delta \implies \|s_t-s_{eq}\|<\epsilon$$
• unstable if it is not stable
• asymptotically stable if it is stable and $$\delta$$ can be chosen such that $$\|s_0-s_{eq}\|<\delta\implies \lim_{t\to\infty} s_t = s_{eq}$$

examples:

• $$s_{t+1} = s_t$$
• $$s_{t+1} = 2s_t$$
• $$s_{t+1} = 0.5 s_t$$

## Stability

Suppose that $$s_0=v$$ is an eigenvector of $$A$$

$$s_{t+1} = As_t$$

$$s_{t} =\lambda^t v$$

## Linear dynamics

Consider $$\mathcal S = \mathbb R^n$$ and linear dynamics

Suppose that $$s_0=v$$ is an eigenvector of $$A$$

$$s_{t+1} = As_t$$

$$s_{t} =\lambda^t v$$

## Linear dynamics

Consider $$\mathcal S = \mathbb R^n$$ and linear dynamics

$$\lambda>1$$

## Linear dynamics

Consider $$\mathcal S = \mathbb R^n$$ and linear dynamics

If similar to a real diagonal matrix: $$A=VDV^{-1} = \begin{bmatrix} |&&|\\v_1&\dots& v_n\\|&&|\end{bmatrix} \begin{bmatrix} \lambda_1&&\\&\ddots&\\&&\lambda_n\end{bmatrix} \begin{bmatrix} -&u_1^\top &-\\&\vdots&\\-&u_n^\top&-\end{bmatrix}$$

$$\displaystyle s_t = \sum_{i=1}^n v_i \lambda_i^t (u_i^\top s_0)$$ is a weighted combination of (right) eigenvectors

$$s_{t+1} = As_t$$

## Linear trajectories in $$n=2$$

General case: real eigenvalues with geometric multiplicity equal to algebraic multiplicity

Example 1:  $$\displaystyle s_{t+1} = \begin{bmatrix} \lambda_1 & \\ & \lambda_2 \end{bmatrix} s_t$$

$$0<\lambda_2<\lambda_1<1$$

$$0<\lambda_2<1<\lambda_1$$

$$1<\lambda_2<\lambda_1$$

Exercise: what do trajectories look like when $$\lambda_1$$ and/or $$\lambda_2$$ is negative? (demo notebook)

## Linear trajectories in $$n=2$$

Example 2:  $$\displaystyle s_{t+1} = \begin{bmatrix} \alpha & -\beta\\\beta & \alpha\end{bmatrix} s_t$$

$$0<\alpha^2+\beta^2<1$$

$$1<\alpha^2+\beta^2$$

Exercise: what do trajectories look like when $$\alpha$$ is negative? (demo notebook)

General case: pair of complex eigenvalues

$$\lambda = \alpha \pm i \beta$$

$$\begin{bmatrix}1\\0\end{bmatrix} \to \begin{bmatrix}\alpha\\ \beta\end{bmatrix}$$

rotation by $$\arctan(\beta/\alpha)$$

scale by $$\sqrt{\alpha^2+\beta^2}$$

## Linear trajectories in $$n=2$$

Example 3:  $$\displaystyle s_{t+1} = \begin{bmatrix} \lambda & 1\\ & \lambda\end{bmatrix} s_t$$

$$0<\lambda<1$$

$$1<\lambda$$

Exercise: what do trajectories look like when $$\lambda$$ is negative? (demo notebook)

General case: eigenvalues with geometric multiplicity $$>1$$

$$\left(\begin{bmatrix} \lambda & \\ & \lambda\end{bmatrix} + \begin{bmatrix} & 1\\ & \end{bmatrix} \right)^t$$

$$=\begin{bmatrix} \lambda^t & t\lambda^{t-1}\\ & \lambda^t\end{bmatrix}$$

All matrices are similar to a matrix of Jordan canonical form

where $$J_i = \begin{bmatrix}\lambda_i & 1 & &\\ & \ddots & \ddots &\\ &&\ddots &1\\ && &\lambda_i \end{bmatrix}\in\mathbb R^{m_i\times m_i}$$

Reference: Ch 3d and 4 in Callier & Desoer, "Linear Systems Theory"

## Linear trajectories in general

$$\begin{bmatrix} J_1&&\\&\ddots&\\&&J_p\end{bmatrix}$$

$$m_i$$ is geometric multiplicity of $$\lambda_i$$

## Linear stability

Theorem: Let $$\{\lambda_i\}_{i=1}^n\subset \mathbb C$$ be the eigenvalues of $$A$$.
Then for $$s_{t+1}=As_t$$, the equilibrium $$s_{eq}=0$$ is

• asymptotically (exponentially, globally) stable $$\iff \max_{i\in[n]}|\lambda_i|<1$$
• unstable if $$\max_{i\in[n]}|\lambda_i|> 1$$
• call $$\max_{i\in[n]}|\lambda_i|=1$$ "marginally (un)stable"

$$\mathbb C$$

Linearization via Taylor Series:

$$s_{t+1} = F(s_t)$$

## Stability via linearization

Stability via linear approximation of nonlinear $$F$$

The Jacobian $$J$$ of $$G:\mathbb R^{n}\to\mathbb R^{m}$$ is defined as $$J(x) = \begin{bmatrix}\frac{\partial G_1}{\partial x_1} & \dots & \frac{\partial G_1}{\partial x_n} \\ \vdots & \ddots & \vdots \\ \frac{\partial G_m}{\partial x_1} &\dots & \frac{\partial G_m}{\partial x_n}\end{bmatrix}$$

$$F(s_{eq}) + J(s_{eq}) (s_t - s_{eq})$$ + higher order terms

$$s_{eq} + J(s_{eq}) (s_t - s_{eq})$$ + higher order terms

$$s_{t+1}-s_{eq} \approx J(s_{eq})(s_t-s_{eq})$$

Consider the dynamics of gradient descent on a twice differentiable function $$g:\mathbb R^d\to\mathbb R^d$$

$$\theta_{t+1} = \theta_t - \alpha\nabla g(\theta_t)$$

Jacobian $$J(\theta) = I - \alpha \nabla^2 g(\theta)$$

• Let $$\{\gamma_i\}_{i=1}^d$$ be the eigenvalues of the Hessian $$\nabla^2 g(\theta_{eq})$$
• Then the eigenvalues of the Jacobian are $$1-\alpha\gamma_i$$
• if any $$\gamma_i\leq 0$$, $$\theta_{eq}$$ is not stable

• i.e. saddle, local maximum, or degenerate critical point of $$g$$

• as long as $$\alpha<\frac{1}{\gamma_i}$$ for all $$i$$, $$\theta_{eq}$$ is stable

Definition: A Lyapunov function $$V:\mathcal S\to \mathbb R$$ for $$F,s_{eq}$$ is continuous and

• (positive definite) $$V(s_{eq})=0$$ and $$V(s_{eq})>0$$ for all $$s\in\mathcal S - \{s_{eq}\}$$
• (decreasing) $$V(F(s)) - V(s) \leq 0$$ for all $$s\in\mathcal S$$
• Optionally,
• (strict) $$V(F(s)) - V(s) < 0$$ for all $$s\in\mathcal S-\{s_{eq}\}$$
• (global) $$\|s-s_{eq}\|_2\to \infty \implies V(s)\to\infty$$

## Stability via Lyapunov

Reference: Bof, Carli, Schenato, "Lyapunov Theory for Discrete Time Systems"

## Stability via Lyapunov

Theorem (1.2, 1.4): Suppose that $$F$$ is locally Lipschitz, $$s_{eq}$$ is a fixed point, and $$V$$ is a Lyapunov function for $$F,s_{eq}$$. Then, $$s_{eq}$$  is

• stable
• asymptotically stable if $$V$$ satisfies the strict property
• globally asymptotically stable if $$V$$ satisfies the strict and global properties

Reference: Bof, Carli, Schenato, "Lyapunov Theory for Discrete Time Systems"

• Stable matrices have quadratic Lyapunov functions of the form $$V(s) = s^\top P s$$ (Theorem 3.2)
• For example, $$P = \sum_{t=0}^\infty (A^\top)^t A^t$$
• Exercise: show that the above is a strict and global Lyapunov function for $$s_{t+1}=As_t$$.
• When Jacobian $$J(s_{eq})$$ is stable, can show that $$V(s)=(s-s_{eq})^\top P (s-s_{eq})$$ is a strict Lyapunov function for $$s_{t+1} = F(s_t)$$.

Theorem (3.3): Suppose $$F$$ is locally Lipschitz, $$s_{eq}$$ is a fixed point, and let $$\{\lambda_i\}_{i=1}^n\subset \mathbb C$$ be the eigenvalues of the Jacobian $$J(s_{eq})$$. Then $$s_{eq}$$ is

• asymptotically stable if $$\max_{i\in[n]}|\lambda_i|<1$$
• unstable if $$\max_{i\in[n]}|\lambda_i|> 1$$

Next time: actions, disturbances, measurement

## Recap

References: Bof, Carli, Schenato, "Lyapunov Theory for Discrete Time Systems"; Callier & Desoer, "Linear Systems Theory"

• Recursive least squares
• Dynamical systems definitions
• difference equations, equilibria, stability
• Linear systems
• eigendecomposition for trajectories & stability
• Nonlinear stability
• Lyapunov functions & linearization

By Sarah Dean

Private