Leonardo Petrini, Alessandro Favero, Mario Geiger, Matthieu Wyart
vs.
\(P\): training set size
\(d\) data-space dimension
What is the structure of real data?
Cat
Bruna and Mallat (2013), Mallat (2016), ...
\(S = \frac{\|f(x) - f(\tau x)\|}{\|\nabla\tau\|}\)
\(f\) : network function
Is it true or not?
Can we test it?
\(x-\)translations
\(y-\)translations
\(x\) input image
\(\tau\) smooth deformation
\(\eta\) isotropic noise with \(\|\eta\| = \langle\|\tau x - x\|\rangle\)
\(f\) network function
Observable that quantifies if a deep net is less sensitive to diffeomorphisms than to generic data transformations
$$R_f = \frac{\langle \|f(x) - f(\tau x)\|^2\rangle_{x, \tau}}{\langle \|f(x) - f(x + \eta)\|^2\rangle_{x, \eta}}$$
Relative stability:
$$R_f = \frac{\langle \|f(\tau x) - f(x)\|^2\rangle_{x, \tau}}{\langle \|f(x + \eta) - f(x)\|^2\rangle_{x, \eta}}$$
Results:
Deep nets learn to become stable to diffeomosphisms!
Relation with performance?
1. How to rationalize this behavior?
2. How do CNNs build diffeo stability and perform well?
Thanks!