Health Data Science Meetup

October 10, 2016

Regularization

 

Git and GitHub

 

Implementations in Python

Regularization

Linear Regression with One Variable

The hypothesis function: 

Cost function:

  • measures the accuracy of our hypothesis function by using a cost function.
  • takes an average of all the results of the hypothesis with inputs from x's compared to the actual output y's
  • This function is otherwise called the "Squared error function", or "Mean squared error".
  • The mean is halved as a convenience for the computation of the gradient descent, as the derivative term of the square function will cancel out the 1/2 term.

Gradient Descent

Now we need to estimate the parameters in hypothesis function.

 

  • The way we do this is by taking the derivative of our cost function.
  • The slope of the tangent is the derivative at that point and it will give us a direction to move towards.
  • We make steps down the cost function in the direction with the steepest descent, and the size of each step is determined by the parameter α, which is called the learning rate.

Gradient Descent

The gradient descent algorithm is:

          repeat until convergence:

 

          where j=0,1 represents the feature index number.

Gradient Descent for Linear Regression:

Why Gradient Descent?

Normal Equation:

Gradient Descent Normal Equation
Need to choose alpha No need to choose alpha
Needs many iterations No need to iterate
Works well when n is large
Slow if n is very large

For large datasets, we usually use stochastic gradient descent.

Regularization

  • High bias or underfitting:
    - when the form of our hypothesis function h maps poorly to the trend of the data.
    - It is usually caused by a function that is too simple or uses too few features.
  • High variance or overfitting:
    - caused by a hypothesis function that fits the available data but does not generalize well to predict new data.
    - It is usually caused by a complicated function that creates a lot of unnecessary curves and angles unrelated to the data.
  • Two main options to address overfitting:
    - Reduce the number of features (manually select which features to keep/use a model selection algorithm)
    - Regularization (Keep all the features, but reduce the parameters θ)

Regularization

If we have overfitting from our hypothesis function, we can reduce the weight that some of the terms in our function carry by increasing their cost.

We want to make it more quadratic

We'll want to eliminate the influence of the cubic and quartic terms.

Without actually getting rid of these features or changing the form of our hypothesis, we can instead modify our cost function:

In general:

L2 regularization (Ridge)

Regularization

L1 regularization (Lasso):

L2 regularization (Ridge):

L1+L2 regularizations (Elastic net):

Comparisons

  • L1 regularization helps perform feature selection in sparse feature spaces
     
  • L1 rarely perform better than L2
    - when two predictors are highly correlated, L1 regularizer will simply pick one of the two predictors
    - in contrast, the L2 regularizer will keep both of them and jointly shrink the corresponding coefficients a little bit
     
  • Elastic net has proved to be (in theory and in practice) better than L1/Lasso

Git and GitHub

What?

- Git: software which keeps track of code changes

- GitHub: a popular server for storing repositories

 

Why?

- Keeps a full history of changes

- Allows multiple programmers to work on the same codebase

- Is efficient and lightweight (records file changes, not file contents)

- Public repositories on GitHub can serve as a coding resume.

Install Git

Mac: brew install git

Ubuntu: apt-get install git

Windows: http://git-scm.com/downloads

Create a GitHub Account

Create a Pair of SSH Keys

Basic Git Configuration

git config --global user.name "User Name"

git config --global user.email "user@email.com"

Pull the Codes

git clone https://github.com/benhhu/HDSMeetup.git

Python

Made with Slides.com