Introduction to neural networks

QTM
Institute for quantitative theory and methods

Image by Chris Benson



Deep neural networks (DNNs)
95% of neural network inference workload in Google datacenters
https://arxiv.org/ftp/arxiv/papers/1704/1704.04760.pdf
Table 1 appears on next slide

Neural networks
-
Mathematical intuition
-
Definition
Kolmogorov
Every continuous function of several variables defined on the unit cube can be represented as a superposition of continuous functions of one variable and the operation of addition (1957).
f(x1,x2,…,xn)=i=1∑2n+1fi(j=1∑nϕi,j(xj))
f(x_1,x_2, \ldots, x_n) = \sum\limits_{i=1}^{2n+1}f_i(\sum\limits_{j=1}^{n}\phi_{i,j}(x_j))
Thus, it is as if there are no functions of several variables at all. There are only simple combinations of functions of one variable.


f1
f_1
fi
f_i
f2n+1
f_{2n+1}
f(x1,x2,…,xn)=i=1∑2n+1fi(j=1∑nϕi,j(xj))
f(x_1,x_2, \ldots, x_n) = \sum\limits_{i=1}^{2n+1}f_i(\sum\limits_{j=1}^{n}\phi_{i,j}(x_j))
x1
x_1
x2
x_2
xn
x_n
ϕ1,n
\phi_{1,n}
ϕ2n+1,n
\phi_{2n+1,n}
ϕ2n+1,1
\phi_{2n+1,1}
ϕ1,1
\phi_{1,1}
ϕ1,2
\phi_{1,2}
ϕ2n+1,2
\phi_{2n+1,2}
f
f
Φ(x1,x2,⋯,xn)=ρ(i=1∑naixi+b)
\Phi(x_1,x_2,\cdots,x_n) = \rho(\sum\limits_{i=1}^n a_i x_i+b)
a1
a_1
a2
a_2
an
a_n
x1
x_1
x2
x_2
xn
x_n
Rn→ΦR1
\mathbb{R}^n \stackrel{\Phi}{\rightarrow}\mathbb{R}^1
-
one ''hidden layer"
-
one "node"
-
"activation" rho
-
"threshold" b
ρ
\rho
Example Feedforward neural network (one "hidden" layer and one node)
where ρ(x):=max(0,x)
\text{where } \rho(x):= \text{max}(0,x)
Definition Feedforward neural network (cited from this paper) a.k.a. MLP or ReLu neural network. L-1 hidden layers.

L=3,N0=4,N1=4,N2=7,N3=3
L=3,\\
N_0 = 4,\\
N_1 = 4,\\
N_2 = 7,\\
N_3 = 3
Here there are two hidden layers (ReLu maps), 3 layers (affine maps). The ReLu maps not displayed.
Image from https://www.doc.ic.ac.uk/~nuric/

Thank you!
Introduction to neural networks QTM Institute for quantitative theory and methods
Introduction to neural networks
By Jeremy Jacobson
Introduction to neural networks
- 344