**Emanuele Natale**

30 May 2023

Supported by

- 2017 - PhD in CS, Sapienza University
- 2014/15 - IRIF, Paris
- 2016, 2018 - Simons Institute for the Theory of Computing
- 2017-2018 - Max-Planck Institute for Informatics
- 2019 - COATI, INRIA d'Université Côte d'Azur

Best PhD + Young Resercher Prizes by It. Ch. EATCS

Computational Dynamics

Collaboration with

/CRONOS

Assembly Calculus

Ideas are sculpted in the brain by sparsifying it.

- L. Valiant

Blalock et al. (2020): **iterated magnitude pruning **still SOTA compression technique.

train

train

prune

prune

train

Frankle & Carbin (ICLR 2019):

Large random networks contains sub-networks that reach comparable accuracy when trained

train

sparse random network

sparse

**bad** network

..., train&prune

train&prune, ...,

large random network

sparse **good** network

train

sparse "ticket" network

sparse

**good** network

rewind

Ramanujan et al. (CVPR 2020) find a good subnetwork without changing weights (*train by pruning*!)

A network with random weights contains sub-networks that can approximate **any** given sufficiently-smaller neural network (without training)

Pensia* *et al. (NeurIPS 2020)

w

w_1

w_n

Find combination of random weights close to \(w\)

Malach et al. (ICML 2020)

w

w_1

w_n

Find random weight

close to \(w\)

\sum_{i\in S\subseteq \{1,...,n\}} w_i \approx w

w_1

w_n

Find combination of random weights close to \(w\):

**RSSP**. For which \(n\) does the following holds?

Given \(X_1,...,X_n\) i.i.d. random variables, with prob. \(1-\epsilon\) for each \(z\in [-1,1]\) there is \(S\subseteq\{1,...,n\}\) such that \[z-\epsilon\leq\sum_{i\in S} X_i \leq z+\epsilon.\]

Lueker '98: \(n=O(\log \frac 1{\epsilon})\)

Deep connection with integer linear programs

[Dyer & Frieze '89,

Borst et al. '22]

**Theorem (da Cunha et al., ICLR 2022).**

Given \(\epsilon,\delta>0\), any CNN with \(k\) parameters and \(\ell\) layers, and kernels with \(\ell_1\) norm at most 1, can be approximated within error \(\epsilon\) by pruning a random CNN with \(O\bigl(k\log \frac{k\ell}{\min\{\epsilon,\delta\}}\bigr)\) parameters and \(2\ell\) layers with probability at least \(1-\delta\).