Regularization
Git and GitHub
Implementations in Python
The hypothesis function:
Cost function:
Now we need to estimate the parameters in hypothesis function.
The gradient descent algorithm is:
repeat until convergence:
where j=0,1 represents the feature index number.
Gradient Descent for Linear Regression:
Normal Equation:
Gradient Descent | Normal Equation |
---|---|
Need to choose alpha | No need to choose alpha |
Needs many iterations | No need to iterate |
Works well when n is large |
Slow if n is very large |
For large datasets, we usually use stochastic gradient descent.
If we have overfitting from our hypothesis function, we can reduce the weight that some of the terms in our function carry by increasing their cost.
We want to make it more quadratic
We'll want to eliminate the influence of the cubic and quartic terms.
Without actually getting rid of these features or changing the form of our hypothesis, we can instead modify our cost function:
In general:
L2 regularization (Ridge)
L1 regularization (Lasso):
L2 regularization (Ridge):
L1+L2 regularizations (Elastic net):
What?
- Git: software which keeps track of code changes
- GitHub: a popular server for storing repositories
Why?
- Keeps a full history of changes
- Allows multiple programmers to work on the same codebase
- Is efficient and lightweight (records file changes, not file contents)
- Public repositories on GitHub can serve as a coding resume.
git config --global user.name "User Name"
git config --global user.email "user@email.com"
git clone https://github.com/benhhu/HDSMeetup.git