PwC Austria & Alpen Adria Universität Klagenfurt
Klagenfurt 2020
Goodfellow, Ian; Bengio, Yoshua; Courville, Aaron. Deep Learning (Adaptive Computation and Machine Learning series) (Page 107). The MIT Press. Kindle Edition.
Goodfellow, Ian; Bengio, Yoshua; Courville, Aaron. Deep Learning (Adaptive Computation and Machine Learning series) (Page 107). The MIT Press. Kindle Edition.
In our linear regression example, we trained the model by minimizing the training error \( MSE_{train} \)
Text
What is also important is the test error \( MSE_{test} \)
There are two goals of machine learning systems:
Minimizing the training error
Make the gap between training and test error small
These two factors correspond to the two central challenges in machine learning: underfitting and overfitting
1
2
Underfitting occurs when the model is not able to obtain a sufficiently low error value on the training set.
Overfitting occurs when the gap between the training error and test error is too large.
capacity is its ability to fit a wide variety of functions.
Models with low capacity may struggle to fit the training set.
Models with high capacity can overfit by memorizing properties of the training set that do not serve them well on the test set.
For the linear model, the error on this test set is very close to the error he had seen on the training set. In such cases, we say the model has generalized well to unseen data.
For the polynomial model, the error is very high (929.12)!. This problem, where the model does very well on the training data but does poorly on the test data is called overfitting.
the Linear model (low capacity) had a higher bias
The polynomial model, on the other hand, suffers from a different problem. The model depends a lot on the choice of training data. If you change the data slightly, the shape of the curve will look very different, and the error will swing widely. Therefore, the model is said to have high variance.
Machine Learning is not a pursuit of perfection (i.e. zero error), but it is about seeking the best tradeoff.
Divide the data into three parts
Training set: The training set is typically 60% of the data. As the name suggests, this is used for training a machine learning model.
Validation set: The validation is also called the the development set. This is typically 20% of the data. This set is not used during training (not including in the training cost function ). It is used to test the quality of the trained model. Errors on the validation set are used to guide the choice of model (e.g. what value of neurons/layers). Even though this set is not used for training, the fact it was used for model selection makes it a bad choice for reporting the final accuracy of the model.
Test set: This set is typically 20% of the data. Its only purpose is to report the accuracy of the final model.
Create a K-fold partition of the the dataset
For each of K experiments, use ( k-1 ) folds (i.e 3) for training and a different fold for testing
Text
Experiment 1
Experiment 2
Experiment 3
Experiment 4
the true error is estimated as the average error rate on test examples
In Machine Learning : the errors made by your model is the sum of three kinds of errors
Total Error = Bias + Variance + Irreducible Error
How to detect a high bias problem?
How to detect a high variance problem?