Mohamad Amin Mohamadi
September 2022
Math of Information, Learning and Data (MILD)
Basic elements in neural network training:
Gradient Descent:
Idea: Study neural networks in the function space!
Gradient Flow:
Change in the function output:
Hmm, looks like we have a kernel on the right hand side!
So:
where
this is called the Neural Tangent Kernel!
Arthur Jacot, Franck Gabriel, Clement Hongler
Thus, we can analytically characterize the behaviour of infinitely wide (and obviously, overparameterized) neural networks, using a simple kernel ridge regression formula !
As it can be seen in the formula, convergence is faster along the kernel principal components of the data (early stopping ;-) )
Lee et al. showed that the training dynamics of a linear-ized version of a neural network can be explained using kernel ridge regression with the kernel as the empirical Neural Tangent Kernel of the network:
More importantly, they provided new approximation bounds for the predictions of the finite neural network and the linear-ized version: