Deep Learning Papers

The economics of Housing & Homelessness

Understanding Deep Learning Requires Rethinking Generalization

Zhang et al. (2017)

Point # 1

"Deep neural networks easily fit random labels."

Notice how the neural networks loss converges fastest on the true labels

But less easily than they fit the real data

Point # 2

"The ability to augment the data using known symmetries is significantly more powerful than just tuning weight decay or preventing low training error"

With Data Augmentation

Without Data Augmentation

A Neural Probabilistic Language model

Bengio et al. (2003)

The Abstract starts with the essentials of the estimation problem

"A goal of statistical language modeling is to learn the joint probability function of sequences of words in a language. This is intrinsically difficult because of the curse of dimensionality: a word sequence on which the model will be tested is likely to be different from all the word sequences seen during training"

The abstract highlights that every paper should state how it accounts for the curse of dimensionality

"We propose to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences."