Feature vs Lazy Learning

What regime is more suitable for which situation? Why?

 

  • Feature Learning (FL) -> large weight changes
  • Lazy Learning (LL) -> small weight changes

Data Symmetries

Fully Connected network:

f(x) = \frac{1}{h} \sum_{i = 1\dots h} \beta_i\: \max \left(0, \frac{1}{\sqrt{d}}\omega_i \cdot x + b_i\right)

Stripe Model

Data Symmetries

Fully Connected network:

f(x) = \frac{1}{h} \sum_{i = 1\dots h} \beta_i\: \max \left(0, \frac{1}{\sqrt{d}}\omega_i \cdot x + b_i\right)

Sphere Model

What about Network Symmetries?




Fully Connected network:

 

Convolutional Network:

 

 

Goal: show that FL can benefit from choosing the proper  architecture

f(x) = \frac{1}{h} \sum_{i = 1\dots h} \beta_i\: \max \left(0, \frac{1}{\sqrt{d}}\omega_i \cdot x + b_i\right)
f(x) = \frac{1}{h} \sum_{i = 1\dots h} \frac{\beta_i}{d}\: \sum_\delta\: \max\left(0, \frac{1}{\sqrt{d}}\: t_\delta[\omega_i] \cdot x + b_i\right)

Translational Invariant Datasets

 

x(r) = a\cos r + b\sin r, \quad r \in [0, 2\pi]
a,b \sim \mathcal{N}(0,1)
y = \text{sign}(\sqrt{a^2 + b^2} - C_0)
  • 2D problem for the FC
  • 1D problem for the CNN

Issues:

  1. For the 2D-sphere we know FL > LL
  2. In the 1D-stripe there are no other dimensions to compress so weight orientation does not matter

1 Fourier Component

Learning Curves

Translational Invariant Dataset


 

x(r) = a\cos r + b\sin r + \mathcal{N}(0,\sigma^2), \qquad\sigma = 0.1
y = \text{sign}(\sqrt{a^2 + b^2} - C_0)
  • Same as before + Gaussian Noise in Real Space
  • In Fourier Space, this is equivalent to adding non-informative dimensions
  • For the FC net this is a short cylinder
  • For the CNN this is a stripe where non-informative directions have small variance

1 Fourier Component + Noise

Translational Invariant Dataset


 

x(r) = a\cos r + b\sin r + \mathcal{N}(0,\sigma^2), \qquad\sigma = 0.1
y = \text{sign}(\sqrt{a^2 + b^2} - C_0)

1 Fourier Component + Noise

Learning Curves

Translational Invariant Dataset


 

x(r_1, r_2) = \sum_{i \in \{1,2\}} a_i\cos r_i + b_i\sin r_i
y = \text{sign}(\sum_i \sqrt{a_i^2 + b_i^2} - C)

2D Image - 1 Fourier Component per dimension

f(x) = \frac{1}{h} \sum_{i = 1\dots h} \frac{\beta_i}{d^2}\: \sum_{(\delta_1, \delta_2)}\: \max\left(0, \frac{1}{d}\: t_{(\delta_1, \delta_2)}[\omega_i] \cdot x + b_i\right)

2D CNN

Translational Invariant Dataset


 

2D Image

Motivation: with 1D CNN we can reduce the dimensionality of the problem by one (translations in 1D), with 2D CNN by two.

 

  • FC -> 4D problem
  • CNN -> 2D problem

1 Fourier Component

1 Fourier Component + Noise

FL vs LL - Network Symmetries

By Leonardo Petrini

FL vs LL - Network Symmetries

  • 98