Sarah Dean PRO
asst prof in CS at Cornell
Prof Sarah Dean
training data
{(xi,yi)}
model
f:X→Y
policy
observation
action
model
ft:X→Y
observation
prediction
xt
Goal: cumulatively over time, predictions y^t=ft(xt) are close to true yt
accumulate
{(xt,yt)}
θt=At−1−1(k=1∑t−1xkxk⊤ +λI)−1bt−1k=1∑t−1xkyk
Follow the (Regularized) Leader
θt=argmink=1∑t−1(θ⊤xk−yk)2+ λ∥θ∥22
Online Gradient Descent
θt=θt−1−α(θt−1⊤xt−1−yt−1)xt−1
Sherman-Morrison formula: (A+uv⊤)−1=A−1−1+v⊤A−1uA−1uv⊤A−1
Follow the (Regularized) Leader
θt=argmink=1∑t−1(θ⊤xk−yk)2+ λ∥θ∥22
Online Gradient Descent
θt=θt−1−α(θt−1⊤xt−1−yt−1)xt−1
Recursive FTRL
A world that evolves over time
st+1=F(st)
(Autonomous) discrete-time dynamical system where F:S→S
S is the state space. The state is sufficient for predicting its future.
Given initial state s0, the solutions to difference equations, i.e. trajectories: (s0,F(s0),F(F(s0)),...)
What might trajectories look like?
An equilibrium point seq satisfies
seq=F(seq)
An equilibrium point seq is
examples:
Suppose that s0=v is an eigenvector of A
st+1=Ast
st=λtv
Consider S=Rn and linear dynamics
Suppose that s0=v is an eigenvector of A
st+1=Ast
st=λtv
Consider S=Rn and linear dynamics
λ>1
Consider S=Rn and linear dynamics
If similar to a real diagonal matrix: A=VDV−1=∣v1∣…∣vn∣λ1⋱λn −−u1⊤⋮un⊤−−
st=i=1∑nviλit(ui⊤s0) is a weighted combination of (right) eigenvectors
st+1=Ast
General case: real eigenvalues with geometric multiplicity equal to algebraic multiplicity
Example 1: st+1=[λ1λ2]st
0<λ2<λ1<1
0<λ2<1<λ1
1<λ2<λ1
Exercise: what do trajectories look like when λ1 and/or λ2 is negative? (demo notebook)
Example 2: st+1=[αβ −βα]st
0<α2+β2<1
1<α2+β2
Exercise: what do trajectories look like when α is negative? (demo notebook)
General case: pair of complex eigenvalues
λ=α±iβ
[10]→[αβ]
rotation by arctan(β/α)
scale by α2+β2
Example 3: st+1=[λ 1λ]st
0<λ<1
1<λ
Exercise: what do trajectories look like when λ is negative? (demo notebook)
General case: eigenvalues with geometric multiplicity >1
([λ λ]+[ 1])t
=[λt tλt−1λt]
All matrices are similar to a matrix of Jordan canonical form
where Ji=λi1⋱⋱⋱1λi∈Rmi×mi
Reference: Ch 3d and 4 in Callier & Desoer, "Linear Systems Theory"
J1⋱Jp
mi is geometric multiplicity of λi
Theorem: Let {λi}i=1n⊂C be the eigenvalues of A.
Then for st+1=Ast, the equilibrium seq=0 is
C
Linearization via Taylor Series:
st+1=F(st)
Stability via linear approximation of nonlinear F
The Jacobian J of G:Rn→Rm is defined as J(x)=∂x1∂G1⋮∂x1∂Gm…⋱…∂xn∂G1⋮∂xn∂Gm
F(seq)+J(seq) (st−seq) + higher order terms
seq+J(seq) (st−seq) + higher order terms
st+1−seq≈J(seq)(st−seq)
Consider the dynamics of gradient descent on a twice differentiable function g:Rd→Rd
θt+1=θt−α∇g(θt)
Jacobian J(θ)=I−α∇2g(θ)
if any γi≤0, θeq is not stable
i.e. saddle, local maximum, or degenerate critical point of g
Definition: A Lyapunov function V:S→R for F,seq is continuous and
Reference: Bof, Carli, Schenato, "Lyapunov Theory for Discrete Time Systems"
Theorem (1.2, 1.4): Suppose that F is locally Lipschitz, seq is a fixed point, and V is a Lyapunov function for F,seq. Then, seq is
Reference: Bof, Carli, Schenato, "Lyapunov Theory for Discrete Time Systems"
Theorem (3.3): Suppose F is locally Lipschitz, seq is a fixed point, and let {λi}i=1n⊂C be the eigenvalues of the Jacobian J(seq). Then seq is
Next time: actions, disturbances, measurement
References: Bof, Carli, Schenato, "Lyapunov Theory for Discrete Time Systems"; Callier & Desoer, "Linear Systems Theory"
By Sarah Dean