# Feature vs Lazy Learning

What regime is more suitable for which situation? Why?

• Feature Learning (FL) -> large weight changes
• Lazy Learning (LL) -> small weight changes

## Data Symmetries

Fully Connected network:

f(x) = \frac{1}{h} \sum_{i = 1\dots h} \beta_i\: \max \left(0, \frac{1}{\sqrt{d}}\omega_i \cdot x + b_i\right)

Stripe Model

## Data Symmetries

Fully Connected network:

f(x) = \frac{1}{h} \sum_{i = 1\dots h} \beta_i\: \max \left(0, \frac{1}{\sqrt{d}}\omega_i \cdot x + b_i\right)

Sphere Model

Fully Connected network:

Convolutional Network:

Goal: show that FL can benefit from choosing the proper  architecture

f(x) = \frac{1}{h} \sum_{i = 1\dots h} \beta_i\: \max \left(0, \frac{1}{\sqrt{d}}\omega_i \cdot x + b_i\right)
f(x) = \frac{1}{h} \sum_{i = 1\dots h} \frac{\beta_i}{d}\: \sum_\delta\: \max\left(0, \frac{1}{\sqrt{d}}\: t_\delta[\omega_i] \cdot x + b_i\right)

## Translational Invariant Datasets

x(r) = a\cos r + b\sin r, \quad r \in [0, 2\pi]
a,b \sim \mathcal{N}(0,1)
y = \text{sign}(\sqrt{a^2 + b^2} - C_0)
• 2D problem for the FC
• 1D problem for the CNN

Issues:

1. For the 2D-sphere we know FL > LL
2. In the 1D-stripe there are no other dimensions to compress so weight orientation does not matter

1 Fourier Component

Learning Curves

## Translational Invariant Dataset

x(r) = a\cos r + b\sin r + \mathcal{N}(0,\sigma^2), \qquad\sigma = 0.1
y = \text{sign}(\sqrt{a^2 + b^2} - C_0)
• Same as before + Gaussian Noise in Real Space
• In Fourier Space, this is equivalent to adding non-informative dimensions
• For the FC net this is a short cylinder
• For the CNN this is a stripe where non-informative directions have small variance

1 Fourier Component + Noise

## Translational Invariant Dataset

x(r) = a\cos r + b\sin r + \mathcal{N}(0,\sigma^2), \qquad\sigma = 0.1
y = \text{sign}(\sqrt{a^2 + b^2} - C_0)

1 Fourier Component + Noise

Learning Curves

## Translational Invariant Dataset

x(r_1, r_2) = \sum_{i \in \{1,2\}} a_i\cos r_i + b_i\sin r_i
y = \text{sign}(\sum_i \sqrt{a_i^2 + b_i^2} - C)

2D Image - 1 Fourier Component per dimension

f(x) = \frac{1}{h} \sum_{i = 1\dots h} \frac{\beta_i}{d^2}\: \sum_{(\delta_1, \delta_2)}\: \max\left(0, \frac{1}{d}\: t_{(\delta_1, \delta_2)}[\omega_i] \cdot x + b_i\right)

2D CNN

## Translational Invariant Dataset

2D Image

Motivation: with 1D CNN we can reduce the dimensionality of the problem by one (translations in 1D), with 2D CNN by two.

• FC -> 4D problem
• CNN -> 2D problem

## 1 Fourier Component + Noise

#### FL vs LL - Network Symmetries

By Leonardo Petrini

• 98