here d=3, h=5 and L=3
weights = orthogonal
bias = 0
But this is for tanh !!!!
ADAM
https://arxiv.org/abs/1806.07572
full batch
x, y = get_mnist()
m = x.mean(0)
cov = (x - m).t() @ (x - m) / len(x)
e, v = cov.symeig(eigenvectors=True)
x = (x - m) @ v[:, :30] / e[:30].sqrt() # PCA
y = y % 2 # parity
L=5
P=10k : number of data points
N : number of parameters
P=10k
L=5
P=10k
L=5
at blackboard :
idea variance <--> generalization
for x in testset