TP4
L(D,w)=∑(x,y)∈Dℓ(yf(w,x))
gradient
∂i L(D,w)=∑(x,y)∈Dℓ′(yf(x))y∂if(w,x)
hessian
∂i∂j L(D,w)=∑(x,y)∈D(ℓ′′(yf(x))y2∂if(w,x)∂jf(w,x)+ℓ′(yf(w,x))y∂i∂jf(w,x))
=Hij0+Hijp
H0=MMT M∈Rn×p
Mik=ℓ′′(ykf(w,xk)yk∂if(w,xk)
rk(H0)≤min(n,p)≤p
hyp: #{neg. eig. of Hp} =cn with 0<c<1
∃local minima⇒rk(H0)≥cn
no local minima⇐rk(H0)<cn⇐p<cn
f(w,x)∈R f(w):Rd→R
p points, binary classification y=±1
Gradient flow w˙=−∇wL(w)
L(w)=⟨ℓ(yf(w,x))⟩x,y
f˙(w,x)=∇wf(w,x)⋅w˙
f˙(w,x′)=−∇wf(w,x′)⋅⟨ℓ′(yf(w,x))y∇wf(w,x)⟩x,y
Neural Tengant Kernel Θ(w,x,x′)=∇wf(w,x)⋅∇wf(w,x′)
f˙(w,x′)=−⟨ℓ′(yf(w,x))yΘ(w,x,x′)⟩x,y
α trick
ℓ(yf(w,x))⟶α1ℓ(αyF(w,x)) where F(w,x)=f(w,x)−f(w0,x)
ℓ(z)=relu(1−z)
α→∞⇒F(w,x)∼O(1/α)
F(w,x)≈F(w0,x)+(w−w0)⋅∇wF(w0,x)=(w−w0)⋅∇wf(w0,x)
∥w−w0∥∼1/α
w∈RN
f˙(w,x′)≈−⟨ℓ′(αyF(w,x))yΘ(w0,x,x′)⟩x,y
F(w,x′)=∫0tdtf˙(w,x′)=⟨(−∫0tℓ′(αyF(w,x))ydt)Θ(w0,x,x′)⟩x,y
F(w(t),x′)=⟨cx(t)Θ(w0,x,x′)⟩x,y
α1relu(1−αyf)=relu(α1−yf)
tp4-2020
By Mario Geiger
tp4-2020
- 728