GLM in Python

Logistic regression

 

Predicting 0 or 1

based on continuous variables

GLM

 

- Linear models

- Logistic regression

Coefficients

 

What are the coefficients of logistic regression? log(odds)

 

Back to the linear models...

Hyperparameter Tuning

 

Search for optimal alpha and lambda

Grid search

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi nec metus justo. Aliquam erat volutpat.

Maximum iteration 

Iteration

 

One iteration is one pass over the entire data set.

Solvers

Sag, Saga

 

  • sag — Stochastic Average Gradient descent. A variation of gradient descent and incremental aggregated gradient approaches that uses a random sample of previous gradient values. Fast for big datasets.

 

  • saga — Extension of sag that also allows for L1 regularization. Should generally train faster than sag.

Stochastic gradient descent

SAG (Stochastic average gradient)

Scoring

 

F1 (Good for imbalanced class)

Check threshold

fpr, tpr, thresholds = metrics.roc_curve(y_test, y_pred)

Cost - FN

predicted there is no event, but there is actually one

 

 

False positive

 

predicted there is event, but there is none

 

-> lead to higher cost calculations 

Title Text

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi nec metus justo. Aliquam erat volutpat.

Elasticnet -R

- Regularisation strength: In glmnet, higher lambda means more shrinkage. glmnet defaults to 100 lambdas to try

Standardisation: an glmnet argument standardizes for the data, and the default is TRUE

- CV: k-fold

Python

- Regularisation strength: in the sklearn, C is "the inverse of regularaization strength (lambda). Smaller values specify stronger regularization". scikit LogisticRegressionCV defaults to 10

- Standardisation: no default scaling.

- CV: stratifiedfolds

Standardisation

for Iinear regression

Need for linear regression

 

Standardizing the features makes the convergence faster

Normally not needed for logistic regression

 

But regularization makes the predictor dependent on the scale of the features.

Standardisation

for logistic regression

Common considersations for classification

  • Check class balance
  • Experiment with different scaling methods
  • Bullet Three

Standard scaler

(Z-score standardisation)

 

 transform the features such that its distribution will have a mean value 0 and standard deviation of 1

 

Values not within [0, 1]

 

sensitive to outlier

MinMax Scaler

(Min max normalisation)

 

rescales the data set such that all feature values are in the range [0, 1]

 

sensitive to outlier

GLM in Python

By yuy ai

GLM in Python

  • 53