Leonardo Petrini, Alessandro Favero, Mario Geiger, Matthieu Wyart
spoc+idephics+pcsl group meeting
May 17, 2021
vs.
Learning in high-d implies that data is highly structured
\(P\): training set size
\(d\) data-space dimension
cats
dogs
embedding space: 3D
data manifold: 2D
task: 1D
Common idea: nets can exploit data structure by getting
invariant to some aspects of the data.
Observables:
Which invariances are learnt by neural nets?
not explaining:
Some insights from the theory of shallow nets...
Shwartz-Ziv and Tishby (2017); Saxe et al. (2019)
Ansuini et al. (2019), Recanatesi et al. (2019)
Kopitkov and Indelman (2020); Oymak et al. (2019);
Paccolat et al. (2020)
Theory of over-parametrized fully connected (FC) nets:
two training regimes exist, depending on the initialization scale
feature learning
(rich, hydrodynamic...)
neural representation evolves
lazy training
\(\sim\) kernel method
vs.
Performance:
Jacot et al. (2018); Chizat et al. (2019); Bach (2018); Mei et al. (2018); Rotskoff and Vanden-Eijnden (2018)...
Bach (2017);
Chizat and Bach (2020); Ghorbani et al. (2019, 2020); Paccolat et al. (2020);
Refinetti et al. (2021);
Yehudai and Shamir (2019)...
Geiger et al. (2020a,b); Lee et al. (2020)
Which data invariances are CNNs learning?
e.g.
CIFAR10 data-point
Hypothesis: images can be classified because the task is invariant to smooth deformations of small magnitude and CNNs exploit such invariance with training.
Bruna and Mallat (2013) Mallat (2016)
...
Is it true or not?
Can we test it?
"Hypothesis" means, informally:
\(\|f(x) - f(\tau x)\|^2\) is small if the \(\| \nabla \tau\|\) is small.
\(f\) : network function
some notation:
\(x(s)\) input image intensity
\(s = (u, v)\in [0, 1]^2\) (continuous) pixel position
\(\tau\) smooth deformation
\(\tau x\) image deformed by \(\tau\)
such that \([\tau x](s)=x(s-\tau(s))\)
\(\tau(s) = (\tau_u(s),\tau_v(s))\) is a vector field
The deformation amplitude is measured by $$\| \nabla \tau\|^2=\int_{[0,1]^2}( (\nabla \tau_u)^2 + (\nabla \tau_v)^2 )dudv$$
Previous empirical works show that small shifts of the input can significantly change the net output.
Issues:
Azulay and Weiss (2018); Dieleman et al. (2016); Zhang (2019) ...
notation:
\(x(s)\) input image intensity
\(s = (u, v)\in [0, 1]^2\) (continuous) pixel position
\(\tau\) smooth deformation
\(\tau x\) image deformed by \(\tau\)
such that \([\tau x](s)=x(s-\tau(s))\)
\(\tau(s) = (\tau_u(s),\tau_v(s))\) is a vector field
The deformation amplitude is measured by $$\| \nabla \tau\|^2=\int_{[0,1]^2}( (\nabla \tau_u)^2 + (\nabla \tau_v)^2 )dudv$$
\(x\) input image
\(\tau\) smooth deformation
\(\eta\) isotropic noise with \(\|\eta\| = \langle\|\tau x - x\|\rangle\)
\(f\) network function
Goal: quantify how a deep net learns to become less sensitive
to diffeomorphisms than to generic data transformations
$$R_f = \frac{\langle \|f(\tau x) - f(x)\|^2\rangle_{x, \tau}}{\langle \|f(x + \eta) - f(x)\|^2\rangle_{x, \eta}}$$
Relative stability:
$$R_f = \frac{\langle \|f(\tau x) - f(x)\|^2\rangle_{x, \tau}}{\langle \|f(x + \eta) - f(x)\|^2\rangle_{x, \eta}}$$
Results:
Deep nets learn to become stable to diffeomosphisms!
Relative stability can be rewritten as
diffeo stability
additive noise stability
\(P\) : train set size
Stability \(D_f\) is not the good observable to characterize how deep nets learn diffeo invariance
As a function of the train set size \(P\):
stability in depth:
Thanks!