### PhD Public DefenseBreaking the Curse of Dimensionality in Deep Neural Networks by Learning Invariant Representations

Candidate: Leonardo Petrini

Thesis Advisor: Prof. Matthieu Wyart

Physics of Complex Systems Lab

November 17, 2023

## Making Predictions from Labelled Data

(supervised learning)

Example: predict person height based on age

age

height

## Making Predictions from Data

(supervised learning)

Example: predict person height based on age

• Tune prediction to fit the data (training);
• Use the prediction on new test data;

age

height

train point

prediction

test point

true value

predicted value

Supervised learning aims for accurate predictions on test data

## Making Predictions from Data

(supervised learning)

Example: predict person height based on age

• Tune prediction to fit the data (training);
• Use the prediction on new test data;

age

height

train point

prediction

test point

predicted value

true value

true value

predicted value

## Image Classification

Modern supervised learning can handle more complex tasks as recognizing images.

model

"cat"

"dog"

What's in the black box?

model

## Opening the Black Box

Solving supervised learning tasks with deep neural networks.

depth

neurons

?

input

output

model

CAT
DOG

## Training Neural Networks

CAT
DOG

## Training Neural Networks

CAT
DOG

## Training Neural Networks

CAT
DOG

## Training Neural Networks

CAT
DOG

## Training Neural Networks

repeat for thousands of images

## Testing on unseen images

CAT
DOG
• It learned the task
• Performance: fraction of correct predictions on test images, can be close to 100%!!

new image

Surprising as this is a complicated task for computers to solve!

81

81

81

81

81

81

81

81

81

81

81

81

74

72

70

70

65

62

32

34

35

35

43

44

22

22

22

21

21

21

21

21

21

21

21

21

## An Image is a Table of Numbers

pixel

• The computer just sees a table of numbers, task looks harder now!
• How do we compare these tables?

0                       100

One number can be represented on a line

(one dimension)

81

still a dog?

\bm{\rightarrow}

## Images as $$d-$$dimensional vectors

• Represent images as points in space:

86

86

86

86

86

72

70

70

65

62

34

35

35

43

44

22

22

21

21

21

21

21

21

21

21

86

55

55

15

13

21

21

21

21

21

13

• The computer just sees a table of numbers, task looks harder now!
• How do we compare these tables?

## Images as $$d-$$dimensional vectors

• Represent images as points in space:

0                       100

Two numbers on a square

(two dimensions)

(81, 81)

100

• Images can be made of a million pixels.
• How do we represent one million dim space?
• The computer just sees a table of numbers, task looks harder now!
• How do we compare these tables?

## Images as $$d-$$dimensional vectors

0                       100

Three numbers on a cube

(three dimensions)

(81, 81, 74)

100

• Represent images as points in space:
• We cannot represent high-dimensional spaces;
• Still, we can study some of their properties.

## The Curse of Dimensionality

as the dimensionality increases, images are further and further apart

• When testing generalization, the new image could be very far away from all the ones in the training set!
• For each new test point to be at distance $$\epsilon$$ from a training point we need a number of training points $$P$$ exponential in the dimension!!
P \propto (1/\epsilon)^d
• To be learnable, real data must be highly structured!
• How to characterize this structure?

## Understanding a Visual Scene

• We want to identify properties that give structure to the space of images
• How do we humans solve an image classification task?

Step 1: Locate the object

usually pixels at the border do not matter for recognizing the object

Step 2: hierarchically recognize edges $$\rightarrow$$ parts $$\rightarrow$$ full object

lines $$\rightarrow$$ textures $$\rightarrow$$ paws, eyes etc. $$\rightarrow$$ head etc. $$\rightarrow$$ dog

## Understanding a Visual Scene

• We want to identify properties that give structure to the space of images
• How do we humans solve an image classification task?

## Understanding a Visual Scene

• How do we humans solve an image classification task?
• New image, and now?

Still a dog, though many things changed

• it moved
• legs etc. in different
relative positions
• ears realized differently

Bottomline: many irrelevant details for solving the task

## Invariances give structure to real data

• Irrelevant details are variations in the input that do not change the label: we call them invariances

pixels here can be of different colors without affecting the class

position does not affect the class

Bottomline: many irrelevant details for solving the task

• Do neural networks understand these invariances?
• How to test it?

## Thesis Title and Outline

• Irrelevant pixels $$\rightarrow$$ linear invariance;
• Irrelevance of exact feature positions $$\rightarrow$$ deformation invariance;
• Hierarchical structure $$\rightarrow$$ synonymic invariance;

## Linear Invariance

• This first invariance can be modeled as:  $$\bm{x} \sim \mathcal{N}(0, 1)$$ $$f^*(\bm x) = g(x_\parallel)$$
• In our work, we show that simple neural networks are able to learn this invariance

label:

input:

• Weights align in the relevant direction
• We can predict how many datapoints are needed to solve this kind of tasks.

Neural networks perform well as they can learn linear invariance

• In real data, features only occupy a small
portion of the image frame.

• Consequently, the frame can be deformed, and
relevant features be moved, without altering the content (deformation invariance).

• Hypothesis: neural networks performance related to how good they are at understanding this property.

Bruna and Mallat '13, Mallat '16

\approx

## Deformation Invariance

Can we test this hypothesis?

## Measuring deformation invariance

f(x)
x
\tau x
\tau
f(\tau x)
R_f \propto \langle\|f(x) - f(\tau x)\|^2 \rangle_{x, \tau}

Invariance measure: relative stability

(normalized such that is =1 if no diffeo stability)

we introduced a model to generate deformations of controlled magnitude

## Correlation between Deformation Invariance and Performance?

• $$R_f \sim 1$$ at initialization for all arch.
• Modern neural networks learn deformation invariance

more invariant

initialization: $$R_f \sim 1$$

more performant

Suggest that understanding deformation invariance is crucial for solving image classification

# Learning Hierarchical Tasks with Deep Neural Networks

...

dog

face

paws

eyes

nose

mouth

ear

edges

Hard to formally characterize the hierarchical structure in real tasks like images or text:

Physicist approach:
Introduce a simplified model of data

## Simple Model of Hierarchical Data

classes:

high-level
features:

etc...

low-level
features:

etc...

\tfrac{1}{2}
\tfrac{1}{2}
\tfrac{1}{2}
\tfrac{1}{2}
\tfrac{1}{2}
\tfrac{1}{2}
\tfrac{1}{2}
\tfrac{1}{2}

synonyms

• Invariance of this model is with respect to exchanges of synonyms

## Simple Model of Hierarchical Data

\tfrac{1}{2}
\tfrac{1}{2}
\tfrac{1}{2}
\tfrac{1}{2}
\tfrac{1}{2}
\tfrac{1}{2}
\tfrac{1}{2}
\tfrac{1}{2}
• Having defined the rules we can sample datapoints
1. Crucial to make the task non-trivial: low-level features are shared
2. Crucial to make the task solvable: there exist correlations between low-level features and labels (gain info about class by observing low level features)

start from class:

intermediate
representations

inputs

## Results on Hierarchical Data

• Deep neural networks are able to solve hierarchical tasks without incurring in the curse of dimensionality (sample complexity is $$\text{poly} (d)$$)
• They do that by learning the hierarchical structure of the task: recognize synonyms

• They recognize synonyms by exploiting input-output correlations

• This allows us to predict the number of training points
needed to learn depending on the hierarchical structure
(num classes, num levels, num features etc...)

synonyms

\tfrac{1}{2}
\tfrac{1}{2}
P/n_cm^L
P

original

rescaled $$x-$$axis

n_cm^L

number of training points

## Conclusions

• How does deep learning solve high-dimensional tasks like image classification?
• We identified properties that give structure to real data:

Invariance to:

1. Irrelevant pixels
2. Exact features position
3. Exchange of synonyms

"dog"

DL model

• We have shown that deep neural networks are able to perform well as they can learn these invariances
\bm\approx

## Work done in collaboration with:

Matthieu Wyart, Francesco Cagnetta, Mario Geiger, Alessandro Favero, Umberto Tomasini, Jonas Paccolat, Eric Vanden-Eijden, Kevin Tyloo

Thank you PCSL for these nice years :)

# Thank you! Grazie! Merci !

We are here (floor 0)

## Apero Logistic Info

Apero here
Cafeteria PH
Room A3 364
floor 3

#### deck

By Leonardo Petrini

• 40