Prof Sarah Dean

## Reminders

• Sign up to scribe and rank preferences for paper presentations by TODAY!
• SR19 "Near optimal finite time identification of arbitrary linear dynamical systems" (9/26)
• Required: meet with Atul at least 2 days before you are scheduled to present
• Working in pairs/groups, self-assessment

## Dynamical System

$$s$$

$$F(s)$$

$$s_{t+1} = F(s_t)$$

An equilibrium point $$s_{eq}$$ is

• stable if "for any desired accuracy, you can find a tolerance that guarantees it in perpetuity"
• unstable if it is not stable
• asymptotically stable if it is stable and "you can find a tolerance that guarantees converges to $$s_{eq}$$"

## Stability

• $$s_{eq}$$ is stable if for all $$\epsilon>0$$, there exists a $$\delta=\delta(\epsilon)$$ such that for all $$t>0$$, $$\|s_0-s_{eq}\|<\delta \implies \|s_t-s_{eq}\|<\epsilon$$

## Stability

• $$s = (x,y)$$ and for some function with $$f(0)=0$$:
• $$x_{t+1} = f(y_t)$$,
• $$y_{t+1} = y_t$$
• stable with $$\epsilon^2 = f(\delta)^2 + \delta^2$$
• $$s_{t+1} = \begin{bmatrix} 0 & 5\\ 0 & 1\end{bmatrix} s_t$$ is stable with $$\delta=\frac{1}{5}\epsilon$$

## Linear stability

Determined by eigenvalues of dynamics matrix $$A$$

$$\mathbb C$$

asymptotically stable

unstable

marginally (un)stable

## Stability via linearization

Stability via linear approximation of nonlinear $$F$$

## Stability via linearization

Stability via linear approximation of nonlinear $$F$$

example: discrete-time damped pendulum

$$\theta_{t+1} = \theta_t + h \omega_t$$

$$\omega_{t+1} =\omega_t + h\left(\frac{g}{\ell}\sin\theta_t-d\omega_t\right)$$

angle $$\theta$$

angular velocity $$\omega$$

gravity

length $$\ell$$

$$\approx (1-dh)\omega_t + h\frac{g}{\ell}(\sin \theta_{eq}+\cos\theta_{eq}(\theta-\theta_{eq})$$

$$\sin x\approx \sin x_0 + \cos x_0(x - x_0)$$

equilibria at $$\theta=k\pi$$ for $$k\in\mathbb N$$

## Stability via linearization

Stability via linear approximation of nonlinear $$F$$

example: discrete-time damped pendulum

angle $$\theta$$

angular velocity $$\omega$$

gravity

length $$\ell$$

$$\begin{bmatrix}\theta_{t+1}-\theta_{eq}\\ \omega_{t+1}-\omega_{eq}\end{bmatrix} \approx \begin{bmatrix} 1 & h\\ h \frac{g}{\ell}\cos(\theta_{eq})& 1-dh\end{bmatrix}\begin{bmatrix}\theta_{t}-\theta_{eq}\\ \omega_{t}-\omega_{eq}\end{bmatrix}$$

at $$\theta_{eq}=0$$, real eigenvalues $$0<\lambda_2<1<\lambda_1$$

at $$\theta_{eq}=\pi$$, complex eigenvalues with $$|\lambda|<1$$ for small $$d$$

Exercise: work out the details of this analysis (simulation notebook)

$$\lambda = 1-h\frac{d}{2} \pm h\sqrt{(\frac{d}{2})^2+\frac{g}{\ell}\cos(\theta_{eq})}$$

## Stability via linearization

Stability via linear approximation of nonlinear $$F$$

example: discrete-time damped pendulum

angle $$\theta$$

angular velocity $$\omega$$

gravity

length $$\ell$$

at $$\theta_{eq}=0$$, real eigenvalues $$0<\lambda_2<1<\lambda_1$$

at $$\theta_{eq}=\pi$$, complex eigenvalues with $$|\lambda|<1$$ for small $$d$$

Linearization via Taylor Series:

$$s_{t+1} = F(s_t)$$

## Stability via linearization

Stability via linear approximation of nonlinear $$F$$

The Jacobian $$J$$ of $$G:\mathbb R^{n}\to\mathbb R^{m}$$ is defined as $$J(x) = \begin{bmatrix}\frac{\partial G_1}{\partial x_1} & \dots & \frac{\partial G_1}{\partial x_n} \\ \vdots & \ddots & \vdots \\ \frac{\partial G_m}{\partial x_1} &\dots & \frac{\partial G_m}{\partial x_n}\end{bmatrix}$$

$$F(s_{eq}) + J(s_{eq}) (s_t - s_{eq})$$ + higher order terms

$$s_{eq} + J(s_{eq}) (s_t - s_{eq})$$ + higher order terms

$$s_{t+1}-s_{eq} \approx J(s_{eq})(s_t-s_{eq})$$

Consider the dynamics of gradient descent on a twice differentiable function $$g:\mathbb R^d\to\mathbb R^d$$

$$\theta_{t+1} = \theta_t - \alpha\nabla g(\theta_t)$$

Jacobian $$J(\theta) = I - \alpha \nabla^2 g(\theta)$$

• Let $$\{\gamma_i\}_{i=1}^d$$ be the eigenvalues of the Hessian $$\nabla^2 g(\theta_{eq})$$
• Then the eigenvalues of the Jacobian are $$1-\alpha\gamma_i$$
• if any $$\gamma_i\leq 0$$, $$\theta_{eq}$$ is not stable

• i.e. saddle, local maximum, or degenerate critical point of $$g$$

• as long as $$\alpha<\frac{1}{\gamma_i}$$ for all $$i$$, $$\theta_{eq}$$ is stable

Definition: A Lyapunov function $$V:\mathcal S\to \mathbb R$$ for $$F$$ is continuous and

• (positive definite) $$V(0)=0$$ and $$V(0)>0$$ for all $$s\in\mathcal S - \{0\}$$
• (decreasing) $$V(F(s)) - V(s) \leq 0$$ for all $$s\in\mathcal S$$
• Optionally,
• (strict) $$V(F(s)) - V(s) < 0$$ for all $$s\in\mathcal S-\{0\}$$
• (global) $$\|s\|_2\to \infty \implies V(s)\to\infty$$

## Stability via Lyapunov

Reference: Bof, Carli, Schenato, "Lyapunov Theory for Discrete Time Systems"

## Stability via Lyapunov

Theorem (1.2, 1.4): Suppose that $$F$$ is locally Lipschitz, $$s_{eq}=0$$ is a fixed point, and $$V$$ is a Lyapunov function for $$F,s_{eq}$$. Then, $$s_{eq}=0$$  is

• stable
• asymptotically stable if $$V$$ satisfies the strict property
• globally asymptotically stable if $$V$$ satisfies the strict and global properties

Reference: Bof, Carli, Schenato, "Lyapunov Theory for Discrete Time Systems"

• Stable matrices have quadratic Lyapunov functions of the form $$V(s) = s^\top P s$$ (Theorem 3.2)
• For example, $$P = \sum_{t=0}^\infty (A^\top)^t A^t$$
• Exercise: show that the above is a strict and global Lyapunov function for $$s_{t+1}=As_t$$.
• When Jacobian $$J(0)$$ is stable, can show that $$V(s)=s^\top P s$$ is a strict Lyapunov function for $$s_{t+1} = F(s_t)$$.

Theorem (3.3): Suppose $$F$$ is locally Lipschitz, $$0$$ is a fixed point, and let $$\{\lambda_i\}_{i=1}^n\subset \mathbb C$$ be the eigenvalues of the Jacobian $$J(0)$$. Then $$0$$ is

• asymptotically stable if $$\max_{i\in[n]}|\lambda_i|<1$$
• unstable if $$\max_{i\in[n]}|\lambda_i|> 1$$

## Dynamical System

$$s$$

$$F(s)$$

$$s_{t+1} = F(s_t)$$

## Dynamical System

$$s$$

$$s_{t+1} = F(s_t, w_t)$$

$$y_t = G(s_t)$$

$$w_t$$

$$y_t$$

## Inputs and outputs

• input signal $$w_t$$ represents external phenomena
• action/control input
• disturbance/process noise
• output signal $$y_t$$ represents what is measured

## $$F$$

$$s$$

$$(w_0, w_1, ... )$$

# $$\Phi$$

$$(y_0, y_1, ...)$$

## Linear system response

$$s_{t+1} = As_t+ w_t$$

$$y_t = Cs_t$$

$$y_{t} = CA^t s_0+ \sum_{k=1}^{t}CA^{k-1}w_{t-k}$$

## Linear-Gaussian system

If $$s_0\sim\mathcal N(\mu, \Sigma)$$ and $$w_t\sim \mathcal N(\mu,\Sigma)$$, then

$$y_t\sim \mathcal N(\sum_{k=0}^{t}CA^{k}\mu, \sum_{k=0}^{t}CA^{k}\Sigma(A^{k})^\top C^\top)$$

The limit as $$t\to\infty$$ is the steady state distribution

## Linear stochastic system

If $$\mathbb E[s_0] = \mathbb E[w_t] =\mu$$ and $$\mathrm{Cov}[s_0] = \mathrm{Cov}[w_t] = \Sigma$$, then

$$\mathbb E[y_t] = \sum_{k=0}^{t}CA^{k}\mu$$

$$\mathrm{Cov}[y_t] = \sum_{k=0}^{t}CA^{k}\Sigma(A^{k})^\top C^\top$$

Can characterize moments without knowing whole distribution

## Bounded inputs & bounded outputs

What if inputs are not stochastic?

The sequence $$(w_0, w_1, ...)$$ can be adversarially chosen, but is bounded

• at every $$t$$: $$\|w_t\|\leq B$$ for all $$t$$
• on average: $$\sqrt{\lim_{T\to\infty}\frac{1}{T}\sum_{t=0}^{T-1} \|w_t\|_2^2}\leq B$$

often possible to show that output are also bounded (upcoming lectures)

## Input-to-state stability

For a system $$s_{t+1} = F(s_t) + w_t$$ where $$F(0)=0$$

Suppose that there is a $$V:\mathcal S\to\mathbb R_+$$ that

• defines a metric: $$\alpha_1 \|s\|_2^2 \leq V(s) \leq \alpha_2 \|s\|_2^2$$ and $$\sqrt{V(s+s')}\leq \sqrt{V(s)}+\sqrt{V(s')}$$
• is contracting: $$V(F(s)) \leq \gamma V(s)$$ for some $$\gamma<1$$ and all $$s\in\mathcal S$$

Then if $$\|w_t\|_2\leq B$$ for all $$t$$, $$\displaystyle \lim_{t\to\infty} \|s_t\|_2 \leq \sqrt{\frac{\alpha_2}{\alpha_1}}\frac{B}{1-\sqrt{\gamma}}$$

$$\sqrt{V(s_{t+1})} = \sqrt{V(F(s_t) + w_t)} \leq \sqrt{V(F(s_t))} + \sqrt{V(w_t)} \leq \sqrt{\gamma V(s_t)} + \sqrt{\alpha_2}\|w_t\|_2$$

$$\implies \sqrt{V(s_{t})} \leq \gamma^{t/2}\sqrt{ V(s_0)} + \sum_{k=1}^t \gamma^{(k-1)/2}\sqrt{\alpha_2}\|w_{t-k}\|_2$$

$$\implies \|s_t\|_2 \leq \gamma^{t/2}\sqrt{ \frac{\alpha_2}{\alpha_1}}\|s_0\|_2 + \sum_{k=1}^t \gamma^{(k-1)/2}\sqrt{\frac{\alpha_2}{\alpha_1}}\|w_{t-k}\|_2$$

## Proof

Consider the dynamics of stochastic gradient descent on a twice differentiable function $$g:\mathbb R^d\to\mathbb R^d$$

$$\theta_{t+1} = \theta_t - \alpha g_t$$

$$= \theta_t - \alpha \nabla g(\theta_t) + \underbrace{\alpha (\nabla g(\theta_t) - g_t) }_{w_t}$$

• may converge to ball around minima
• in general, could jump between minima
• in practice, decreasing step size $$\alpha_t\propto 1/t$$

Next time: see these ideas in action!

## Recap

References: Bof, Carli, Schenato, "Lyapunov Theory for Discrete Time Systems"

• Nonlinear stability
• Linearization
• Lyapunov functions
• Inputs and outputs
• System response
• Stability

By Sarah Dean

Private