TP4
\(\mathcal{L}(\mathcal{D}, w) = \sum_{(x,y)\in \mathcal{D}} \ell(yf(w, x)) \)
gradient
\( \partial_i \mathcal{L}(\mathcal{D}, w) = \sum_{(x,y)\in \mathcal{D}} \ell'(yf(x)) y \partial_i f(w, x) \)
hessian
\( \partial_i \partial_j \mathcal{L}(\mathcal{D}, w) = \sum_{(x,y)\in \mathcal{D}} \left( \ell''(yf(x)) y^2 \partial_i f(w, x) \partial_j f(w, x) + \ell'(y f(w,x)) y \partial_i \partial_j f(w,x) \right) \)
\(= \mathcal{H}^0_{ij} + \mathcal{H}^p_{ij} \)
\(H_0 = M M^T\) \(M \in \mathbb{R}^{n \times p}\)
\(M_{ik} = \sqrt{\ell''(y_k f(w, x_k)} y_k \partial_i f(w, x_k) \)
\( rk(\mathcal{H}_0) \leq \min(n, p) \leq p \)
hyp: #{neg. eig. of \(\mathcal{H}_p\)} \(= cn\) with \(0<c<1\)
\(\exists \text{local minima} \Rightarrow rk(\mathcal{H}_0) \geq cn\)
\(\text{no local minima} \Leftarrow rk(\mathcal{H}_0) < cn \Leftarrow p < c n \)
\(f(w, x) \in \mathbb{R}\) \(f(w): \mathbb{R}^d \rightarrow \mathbb{R}\)
\(p\) points, binary classification \(y = \pm1\)
Gradient flow \(\dot{w} = -\nabla_w \mathcal{L}(w)\)
\(\mathcal{L}(w) = \langle \ell(y f(w, x)) \rangle_{x,y}\)
\(\dot{f}(w,x) = \nabla_w f(w,x) \cdot \dot{w}\)
\(\dot{f}(w,x') = -\nabla_w f(w,x') \cdot \langle \ell'(y f(w,x)) y \nabla_w f(w,x) \rangle_{x,y} \)
Neural Tengant Kernel \(\Theta(w, x, x') = \nabla_w f(w,x) \cdot \nabla_w f(w,x') \)
\(\dot{f}(w,x') = - \langle \ell'(y f(w,x)) y \Theta(w,x,x') \rangle_{x,y} \)
\(\alpha\) trick
\( \ell(y f(w,x)) \longrightarrow \frac{1}{\alpha} \ell(\alpha y F(w, x))\) where \(F(w,x) = f(w,x) - f(w_0,x)\)
\(\ell(z) = relu(1 - z)\)
\( \alpha \rightarrow \infty \Rightarrow F(w, x) \sim O(1 / \alpha)\)
\( F(w,x) \approx F(w_0, x) + (w-w_0) \cdot \nabla_w F(w_0,x) = (w - w_0) \cdot \nabla_w f(w_0, x) \)
\( \| w - w_0 \| \sim 1/\alpha \)
\( w \in \mathbb{R}^N\)
\(\dot{f}(w,x') \approx - \langle \ell'(\alpha y F(w,x)) y \Theta(w_0,x,x') \rangle_{x,y} \)
\( F(w,x') = \int_0^t dt \dot{f}(w,x') = \langle \left(-\int_0^t \ell'(\alpha y F(w,x)) y dt \right)\Theta(w_0,x,x') \rangle_{x,y} \)
\( F(w(t),x') = \langle c_x(t) \Theta(w_0, x, x') \rangle_{x,y} \)
\(\frac{1}{\alpha} relu(1 - \alpha yf) = relu(\frac{1}{\alpha} - yf)\)
tp4-2020
By Mario Geiger
tp4-2020
- 670