B. Bordelon, A. Canatar, C. Pehlevan
Overview
Kernel (ridge) regression
yi=f⋆(xi)
Kij=K(xi,xj)
ki(x)=K(x,xi)
Mercer decomposition
Kernel regression in feature space
design matrix Ψρ,i=ψρ(xi)
e.g. Teacher = Gaussian:
Generalization error and spectral components
the target function is only here!
the data points are only here!
Approximation for ⟨G2⟩
Approximation for ⟨G2⟩
PDE solution
Note: the same result is found with replica calculations!
Comments on the result
Small p:
Large p:
Dot-product kernels in d→∞
e.g. NTK
(everything I say next could be derived for translation-invariant kernels as well)
for d→∞, N(d,k)∼dk
and λk∼N(d,k)−1∼d−k
NTK
Dot-product kernels in d→∞
Numerical experiments
Three settings are considered:
kernel regression with KNTK
→ learn with NNs (4 layers h=500, 2 layers h=10000)
Note: this contains several spherical harmonics
Kernel regression with 4-layer NTK kernel
d=10, λ=5
d=10, λ=0 ridgeless
d=100, λ=0 ridgeless
Ek=∑m=1N(d,k)Ekm=N(d,k)Ek,1
Pure λk modes with NNs
2 layers, width 10000
4 layers, width 500
f⋆ has only the k mode
d=30
Teacher-Student 2-layer NNs
d=25, width 8000