TP4

\(\mathcal{L}(\mathcal{D}, w) = \sum_{(x,y)\in \mathcal{D}} \ell(yf(w, x)) \)

gradient

\( \partial_i  \mathcal{L}(\mathcal{D}, w) = \sum_{(x,y)\in \mathcal{D}} \ell'(yf(x)) y \partial_i f(w, x) \)

hessian

\( \partial_i \partial_j  \mathcal{L}(\mathcal{D}, w) = \sum_{(x,y)\in \mathcal{D}} \left( \ell''(yf(x)) y^2 \partial_i f(w, x) \partial_j f(w, x) + \ell'(y f(w,x)) y \partial_i \partial_j f(w,x) \right) \)

\(= \mathcal{H}^0_{ij} + \mathcal{H}^p_{ij} \)

\(H_0 = M M^T\)    \(M \in \mathbb{R}^{n \times p}\)

\(M_{ik} = \sqrt{\ell''(y_k f(w, x_k)} y_k \partial_i f(w, x_k) \)

\( rk(\mathcal{H}_0) \leq \min(n, p) \leq p \)

hyp: #{neg. eig. of \(\mathcal{H}_p\)} \(= cn\)  with \(0<c<1\)

\(\exists \text{local minima} \Rightarrow rk(\mathcal{H}_0) \geq cn\)

\(\text{no local minima} \Leftarrow rk(\mathcal{H}_0) < cn \Leftarrow p < c n \)

\(f(w, x) \in \mathbb{R}\)        \(f(w): \mathbb{R}^d \rightarrow \mathbb{R}\)

\(p\) points, binary classification \(y = \pm1\)

Gradient flow \(\dot{w} = -\nabla_w \mathcal{L}(w)\)

\(\mathcal{L}(w) = \langle \ell(y f(w, x)) \rangle_{x,y}\)

\(\dot{f}(w,x) = \nabla_w f(w,x) \cdot \dot{w}\)

\(\dot{f}(w,x') = -\nabla_w f(w,x') \cdot \langle \ell'(y f(w,x)) y \nabla_w f(w,x) \rangle_{x,y} \)

Neural Tengant Kernel \(\Theta(w, x, x') = \nabla_w f(w,x) \cdot \nabla_w f(w,x') \)

\(\dot{f}(w,x') = - \langle \ell'(y f(w,x)) y \Theta(w,x,x') \rangle_{x,y} \)

\(\alpha\) trick

\( \ell(y f(w,x)) \longrightarrow \frac{1}{\alpha} \ell(\alpha y F(w, x))\) where \(F(w,x) = f(w,x) - f(w_0,x)\) 

\(\ell(z) = relu(1 - z)\)

\( \alpha \rightarrow \infty \Rightarrow F(w, x) \sim O(1 / \alpha)\)

\( F(w,x) \approx F(w_0, x) + (w-w_0) \cdot \nabla_w F(w_0,x) = (w - w_0) \cdot \nabla_w f(w_0, x) \)

\( \| w - w_0 \| \sim 1/\alpha \)

\( w \in \mathbb{R}^N\)

\(\dot{f}(w,x') \approx - \langle \ell'(\alpha y F(w,x)) y \Theta(w_0,x,x') \rangle_{x,y} \)

\( F(w,x') = \int_0^t dt \dot{f}(w,x') = \langle \left(-\int_0^t \ell'(\alpha y F(w,x)) y dt \right)\Theta(w_0,x,x') \rangle_{x,y} \) 

\( F(w(t),x') = \langle c_x(t) \Theta(w_0, x, x') \rangle_{x,y} \)

\(\frac{1}{\alpha} relu(1 - \alpha yf) = relu(\frac{1}{\alpha} - yf)\)

tp4-2020

By Mario Geiger

tp4-2020

  • 670