# TP4

$$\mathcal{L}(\mathcal{D}, w) = \sum_{(x,y)\in \mathcal{D}} \ell(yf(w, x))$$

$$\partial_i \mathcal{L}(\mathcal{D}, w) = \sum_{(x,y)\in \mathcal{D}} \ell'(yf(x)) y \partial_i f(w, x)$$

hessian

$$\partial_i \partial_j \mathcal{L}(\mathcal{D}, w) = \sum_{(x,y)\in \mathcal{D}} \left( \ell''(yf(x)) y^2 \partial_i f(w, x) \partial_j f(w, x) + \ell'(y f(w,x)) y \partial_i \partial_j f(w,x) \right)$$

$$= \mathcal{H}^0_{ij} + \mathcal{H}^p_{ij}$$

$$H_0 = M M^T$$    $$M \in \mathbb{R}^{n \times p}$$

$$M_{ik} = \sqrt{\ell''(y_k f(w, x_k)} y_k \partial_i f(w, x_k)$$

$$rk(\mathcal{H}_0) \leq \min(n, p) \leq p$$

hyp: #{neg. eig. of $$\mathcal{H}_p$$} $$= cn$$  with $$0<c<1$$

$$\exists \text{local minima} \Rightarrow rk(\mathcal{H}_0) \geq cn$$

$$\text{no local minima} \Leftarrow rk(\mathcal{H}_0) < cn \Leftarrow p < c n$$

$$f(w, x) \in \mathbb{R}$$        $$f(w): \mathbb{R}^d \rightarrow \mathbb{R}$$

$$p$$ points, binary classification $$y = \pm1$$

Gradient flow $$\dot{w} = -\nabla_w \mathcal{L}(w)$$

$$\mathcal{L}(w) = \langle \ell(y f(w, x)) \rangle_{x,y}$$

$$\dot{f}(w,x) = \nabla_w f(w,x) \cdot \dot{w}$$

$$\dot{f}(w,x') = -\nabla_w f(w,x') \cdot \langle \ell'(y f(w,x)) y \nabla_w f(w,x) \rangle_{x,y}$$

Neural Tengant Kernel $$\Theta(w, x, x') = \nabla_w f(w,x) \cdot \nabla_w f(w,x')$$

$$\dot{f}(w,x') = - \langle \ell'(y f(w,x)) y \Theta(w,x,x') \rangle_{x,y}$$

$$\alpha$$ trick

$$\ell(y f(w,x)) \longrightarrow \frac{1}{\alpha} \ell(\alpha y F(w, x))$$ where $$F(w,x) = f(w,x) - f(w_0,x)$$

$$\ell(z) = relu(1 - z)$$

$$\alpha \rightarrow \infty \Rightarrow F(w, x) \sim O(1 / \alpha)$$

$$F(w,x) \approx F(w_0, x) + (w-w_0) \cdot \nabla_w F(w_0,x) = (w - w_0) \cdot \nabla_w f(w_0, x)$$

$$\| w - w_0 \| \sim 1/\alpha$$

$$w \in \mathbb{R}^N$$

$$\dot{f}(w,x') \approx - \langle \ell'(\alpha y F(w,x)) y \Theta(w_0,x,x') \rangle_{x,y}$$

$$F(w,x') = \int_0^t dt \dot{f}(w,x') = \langle \left(-\int_0^t \ell'(\alpha y F(w,x)) y dt \right)\Theta(w_0,x,x') \rangle_{x,y}$$

$$F(w(t),x') = \langle c_x(t) \Theta(w_0, x, x') \rangle_{x,y}$$

$$\frac{1}{\alpha} relu(1 - \alpha yf) = relu(\frac{1}{\alpha} - yf)$$

By Mario Geiger

• 612