Linear Models
Quentin Fayet
This presentation aims to achieve several goals
However, (linear) regression is a vaste subject, so this presentation won't introduce:
Regression attempts to analyze the links between two or more variables.
General problem :
As we have a single feature, this is Simple Linear Regression
(dataset from http://people.sc.fsu.edu/)
Inputs :
Outputs:
Supervised learning
Part. 1
9.00,3571,1976,0.5250,541
9.00,4092,1250,0.5720,524
9.00,3865,1586,0.5800,561
7.50,4870,2351,0.5290,414
8.00,4399,431,0.5440,410
10.00,5342,1333,0.5710,457
8.00,5319,11868,0.4510,344
8.00,5126,2138,0.5530,467
8.00,4447,8577,0.5290,464
7.00,4512,8507,0.5520,498
...
Structure : gas tax, avg income, paved highways, % driver licence, consumption
import pandas as pd
petrol = pd.read_csv('petrol.csv',
header=None,
names=['tax', 'avg income', 'paved highways', '% drive licence', 'consumption']);
2 dimensions :
import matplotlib.pyplot as plt
def plot():
plt.scatter(petrol['% drive licence'], petrol['consumption'], color='black')
plt.xlabel('% driving licences')
plt.ylabel('consumption (M of gallons)')
plt.title('Gas consumption as a function of % driving licences')
plt.show()
Part. 2 (linear model)
For a set of features variables and 1 output variable :
: explained variable (the predicted output)
: hypothesis parameterized by Theta
: parameters (what seek to determine)
: explainatory variables (the given inputs)
We have :
Finding would perform the regression
When the number of features grows up, scalar notation is inefficient
: explained variable (the predicted output)
: hypothesis parameterized by vector Theta
: vector of parameters (what seek to determine)
: vector of explainatory variables (the given inputs)
: Transpose vector of parameters
Considering a single input :
Represents prediction over the whole dataset and features in a single equation. For inputs of features:
: vector or explained variable (the predicted outputs)
: hypothesis parameterized by vector Theta
: vector of parameters (what seek to determine)
: matrice of explainatory variables
: Transpose matrice of explainatory variables
Using numpy, we can easily implement the hypothesis as a matrice product:
import numpy as np
def hypothesis(X, theta):
return np.dot(X, theta)
If we'd like to batch predict 30 gas consumptions :
Part. 3
Goal is to minimize the sum of the squared errors
is actually the difference between the prediction and the actual value :
Thus, the prediction is the result of the hypothesis :
Generalizing for entries, total error is given by:
Thus, total squared error is (the cost function):
Lets implement the error (piece of cake thanks to numpy arrays):
import numpy as np
def error(actual, expected):
return actual - expected
And the squared errors:
import numpy as np
def squared_error(actual, expected):
return error(actual, expected) ** 2
Finally, we need to minimize the sum of least squares:
This is called Ordinary Least Squares
We may sometimes encounter an other notation:
Which is the same, using l2-norm regularization notation
Plotting the cost function gives an insight on the minimization problem to be solved:
For Mean Least Squares we take the average error over the dataset
Hence, for inputs, we have:
stands to make derivate simplier.
Gradient descent is one of the possible way to solve the optimization problem.
Repeat until convergence :
{
}
: learning rate
: partial derivative of over
Once the partial derivative applied, we obtain:
Repeat until convergence :
{
}
Let's implement the batch gradient descent:
import numpy as np
from hypothesis import hypothesis
from squared_error import squared_error, error
def gradient_descent(X, y, theta, alpha, iterations = 10000):
X_transpose = X.transpose()
m = len(X)
for i in range(0, iterations):
hypotheses = hypothesis(X, theta)
cost = np.sum(squared_error(hypotheses, y)) / (2 * m)
gradient = np.dot(X_transpose, error(hypotheses, y)) / m
theta = theta - alpha * gradient
print("Iteration {} / Cost: {:.4f}".format(i, cost))
return theta
When running the gradient descent on the dataset, we obtain the following parameters:
Considering the following:
Gradient descent (batch / stochastic) are not the only ways.
Part. 4
Using scikit-learn, performing OLS regression is easy:
from sklearn import linear_model
# Dataset has been previously loaded as shown
# Instanciate the linear regression object
regression = linear_model.LinearRegression()
# Train the model
regression.fit(petrol['% drive licence'].reshape(-1, 1),
pretrol['consumption'])
coeffectients = regression.coef_ #1-D array in Simple Linear Regression
intercept = regression.intercept_
Training gives us the following coefficients:
print('Intercept term: \n', regr.intercept_)
print('Coefficients: \n', regr.coef_)
Plotting against our own implementation:
Ridge regression penalizes the collinearity of the explainatory variables by introducing a shrinkage amount
Minimizes the "ridge" effect
The optimization problem to solve is slightly different:
Using l2-norm regularization on both error and parameters
is the shrinkage amount
Ridge regression makes sense in multivariables regression, not in Simple Linear Regression as it penalizes multicollinearity
from sklearn import linear_model
# Dataset has been previously loaded as shown
# Instanciate the linear regression object
regression = linear_model.Ridge(alpha=.01)
# Train the model
regression.fit(petrol['% drive licence'].reshape(-1, 1),
pretrol['consumption'])
coeffectients = regression.coef_
intercept = regression.intercept_
Training gives us the following coefficients:
print('Intercept term: \n', regr.intercept_)
print('Coefficients: \n', regr.coef_)
Plotting against other methods:
Lasso Regression is particularly efficient on problems where the number of features is much bigger than the number of entries in the training dataset:
It highlights some features and minimizes some others
The optimization problem to solve:
Using l2-norm regularization on error and l1-norm on parameters
Ridge regression makes sense in multivariables regression, not in Simple Linear Regression as it penalizes multicollinearity
from sklearn import linear_model
# Dataset has been previously loaded as shown
# Instanciate the linear regression object
regression = linear_model.Lasso(alpha=.1)
# Train the model
regression.fit(petrol['% drive licence'].reshape(-1, 1),
pretrol['consumption'])
coeffectients = regression.coef_
intercept = regression.intercept_
Training gives us the following coefficients:
print('Intercept term: \n', regr.intercept_)
print('Coefficients: \n', regr.coef_)
Plotting against other methods:
Part. 5