Machine Learning Techniques
Class: sklearn.linear_model.RidgeClassifier
Some Parameters:
Predict confidence scores for samples. |
densify() |
Convert coefficient matrix to dense array format. |
fit(X, y[, coef_init, intercept_init, …]) |
Fit linear model with Stochastic Gradient Descent. |
get_params([deep]) |
Get parameters for this estimator. |
partial_fit(X, y[, classes, sample_weight]) |
Perform one epoch of stochastic gradient descent on given samples. |
predict(X) |
Predict class labels for samples in X. |
Log of probability estimates. |
Probability estimates. |
score(X, y[, sample_weight]) |
Return the mean accuracy on the given test data and labels. |
set_params(**params) |
Set the parameters of this estimator. |
sparsify() |
Convert coefficient matrix to sparse format. |
Some common methods for all classifiers
Class: sklearn.linear_model.LogisticRegression
Some Parameters:
'none' - no penalty is added
'l2' - add a L2 penalty term and it is the default choice
'l1' - add a L1 penalty term
'elasticnet' - both L1 and L2 penalty terms are added
solver (default = 'lbfgs')
'liblinear' - uses a coordinate descent (CD) algorithm
'lbfgs' - an optimizer in the family of quasi-Newton methods.
'newton-cg', 'sag', 'saga'
Linear Support vector machine
Class: sklearn.linear_model.SGDClassifier
This estimator implements regularized linear models with SGD.
The gradient of the loss is estimated each sample at a time and the model is updated along the way with a decreasing learning rate.
Some parameters
penalty - 'l2’, ‘l1’, ‘elasticnet’ (default = 'l2')
loss (default = 'hinge')
'hinge' - (soft-margin) linear Support Vector Machine,
'modified_huber' - smoothed hinge loss brings tolerance to outliers as well as probability estimates
'log' - logistic regression
'squared_hinge' - like hinge but is quadratically penalized
'perceptron' - linear loss used by the perceptron algorithm
regression losses - ‘squared_error’, ‘huber’, ‘epsilon_insensitive’, or ‘squared_epsilon_insensitive’
alpha (default = 0.0001)
constant that multiplies the regularization term.
fit_intercept (default = True)
If False, the data is assumed to be already centered.
max_iter (default = 1000)
maximum number of passes over the training data (aka epochs).
learning_rate (default = ’optimal’)
‘constant’: eta = eta0
(default eta0=0.0, initial learning rate)
‘optimal’: eta = 1.0 / (alpha * (t + t0))
where t0 is chosen by a heuristic proposed by Leon Bottou.
‘invscaling’: eta = eta0 / pow(t, power_t)
‘adaptive’: eta = eta0
, as long as the training keeps decreasing. Each time n_iter_no_change consecutive epochs fail to decrease the training loss by tol or fail to increase validation score by tol if early_stopping is True, the current learning rate is divided by 5.
tol (default = 1e-3)
stopping criterion.
If it is not None, training will stop when (loss > best_loss - tol) for n_iter_no_change
consecutive epochs.
Convergence is checked against the training loss or the validation loss depending on the early_stopping
early_stopping (default = False)
to terminate training when validation score is not improving.
If set to True, it will automatically set aside a stratified fraction of training data as validation and terminate training when validation score returned by the score
method is not improving by at least tol for n_iter_no_change consecutive epochs
validation_fraction (default = 0.1)
proportion of training data to set aside as validation set for early stopping
. Must be between 0 and 1.
Only used if
is True.early_stopping
n_iter_no_change (default = 5)
Number of iterations with no improvement to wait before stopping fitting.
Convergence is checked against the training loss or the validation loss depending on the
class_weight (default = None) {class_label: weight} or “balanced”,
The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y))
Preset for the class_weight fit parameter. Weights associated with classes. If not given, all classes are supposed to have weight one.
It is a simple classification algorithm suitable for large-scale learning.
Class: sklearn.linear_model.Perceptron
Some Parameters:
penalty - 'l2’, ‘l1’, ‘elasticnet’ (default = 'l2')
alpha - (default = 0.0001)
l1_ratio - (default = 0.15)
fit_intercept - (default = True)
max_iter - (default = 1000)
tol - (default = 1e-3)
eta0 - (default = 1)
early_stopping - (default = False)
validation_fraction - (default = 0.1)
n_iter_no_change - (default = 5)
implements two different nearest neighbors classifiers.KNeighborsClassifier | RadiusNeighborsClassifier |
implements learning based on the k nearest neighbors of each query point, where k is an integer value specified by the user. | implements learning based on the number of neighbors within a fixed radius r of each training point, where r is a floating-point value specified by the user. |
most commonly used technique choice of the value k is highly data-dependent |
used in cases where the data is not uniformly sampled |
larger k suppresses the effects of noise, but makes the classification boundaries less distinct. | user specifies a fixed radius r, such that points in sparser neighborhoods use fewer nearest neighbors for the classification |
Class: sklearn.neighbors.KNeighborsClassifier
Some Parameters
Class: sklearn.neighbors.RadiusNeighborsClassifier
Some Parameters
Range of parameter space to use by default for radius_neighbors queries
weights (‘uniform’, ‘distance’, [callable], default = ’uniform')
algorithm (‘ball_tree’, ‘kd_tree’, ‘brute’, ‘auto’, default = 'auto'
leaf_size (default = 30)
p (default = 2)
metric (default = ’minkowski’)
Distance metric to use for the tree.
Multiclass classification
Multilabel classification
problem types
determines the type of data indicated by the target.
Parameters : y (array-like), Returns : target_type (string)
target_type | y |
'continuous' | array-like of floats that are not all integers and is 1d or a column vector. |
'continuous-multioutput' | 2d array of floats that are not all integers, and both dimensions are of size > 1. |
‘binary’ | contains <= 2 discrete values and is 1d or a column vector. |
‘multiclass’ | contains more than two discrete values, is not a sequence of sequences, and is 1d or a column vector. |
‘multiclass-multioutput’ | 2d array that contains more than two discrete values, is not a sequence of sequences, and both dimensions are of size > 1. |
‘unknown’ | array-like but none of the above, such as a 3d array, sequence of sequences, or an array of non-sequence objects. |
>>> from sklearn.utils.multiclass import type_of_target
>>> import numpy as np
>>> type_of_target([0.1, 0.6])
>>> type_of_target([1, -1, -1, 1])
>>> type_of_target(['a', 'b', 'a'])
>>> type_of_target([1.0, 2.0])
>>> type_of_target([1, 0, 2])
>>> type_of_target([1.0, 0.0, 3.0])
>>> type_of_target(['a', 'b', 'c'])
>>> type_of_target(np.array([[1, 2], [3, 1]]))
>>> type_of_target(np.array([[1.5, 2.0], [3.0, 1.6]]))
Constructs one classifier per pair of classes.
At prediction time, the class which received the most votes is selected. In the event of a tie, it selects the class with the highest aggregate classification confidence by summing over the pair-wise classification confidence levels computed by the underlying binary classifiers.
classifiers needed = 2nclasses×(nclasses−1)
slower than one-vs-the-rest, due to its O(nclasses2) complexity.
advantage : for kernel algorithms which don’t scale well
target_type | y |
‘multilabel-indicator’ | label indicator matrix, an array of two dimensions with at least two columns, and at most 2 unique values. |
type_of_target(np.array([[0, 1], [1, 1]]))
>>> type_of_target([[1, 2]])
MultiOutputClassifier | ClassifierChain |
Strategy consists of fitting one classifier per target. | Way of combining a number of binary classifiers into a single multi-label model that is capable of exploiting correlations among targets. |
Allows multiple target variable classifications. Able to estimate a series of target functions that are trained on a single predictor matrix to predict a series of responses. |
For a multi-label classification problem with N classes, N binary classifiers are assigned an integer between 0 and N-1. These integers define the order of models in the chain. |
Calibration curves
compare how well the probabilistic predictions of a binary classifier are calibrated.
plots the true frequency of the positive label against its predicted probability, for binned predictions.
x axis : average predicted probability in each bin
y axis : fraction of positives, i.e. the proportion of samples whose class is the positive class (in each bin).
LogisticRegression returns well calibrated predictions by default as it directly optimizes Log loss.
GaussianNB tends to push probabilities to 0 or 1.
RandomForestClassifier peaks at approximately 0.2 and 0.9 probability, while probabilities close to 0 or 1 are very rare.
LinearSVC focus on difficult to classify samples that are close to the decision boundary (the support vectors).
ensemble = True
ensemble = False
Model selection for classification
Class: sklearn.model_selection.StratifiedKFold
Some Parameters:
n_splits (default = 5)
Number of folds. Must be at least 2.
shuffle (default = False)
to shuffle or not to shuffle each class’s samples before splitting into batches.
samples within each split will not be shuffled.
random_state RandomState instance or None, (default=None)
set random_state when shuffle = True because it affects the ordering of the indices, which controls the randomness of each fold for each class.
Class: sklearn.model_selection.StratifiedShuffleSplit
Some Parameters:
from sklearn.model_selection import StratifiedKFold, StratifiedShuffleSplit
import numpy as np
X, y = np.random.randint(1,50,50), np.hstack(([0] * 45, [1] * 5))
skf = StratifiedKFold(n_splits=3)
count = 1
for train, test in skf.split(X, y):
print('Split', count)
print('train - {} | test - {}'.format(np.bincount(y[train]), np.bincount(y[test])))
sss = StratifiedShuffleSplit(n_splits=3, test_size=0.2, random_state=0)
count = 1
for train_index, test_index in sss.split(X, y):
print('Split', count)
print('train - {} | test - {}'.format(np.bincount(y[train_index]), np.bincount(y[test_index])))
Example to compare StratifiedKFold and StratifiedShuffleSplit
Class: sklearn.linear_model.LogisticRegressionCV
Some Parameters:
'Cs' (default = 10)
Each of the values in Cs describes the inverse of regularization strength.
If int, then a grid of values = logarithmic scale between 1e−4 & 1e4.
'cv' (default = None)
The default cross-validation generator used is Stratified K-Folds.
If an integer is provided, then it is the number of folds used.
scoring (default = None)
A string or scorer(estimator, X, y) . (default scoring option used is ‘accuracy’).
penalty (‘l1’, ‘l2’, ‘elasticnet’, default=‘l2’)
refit (default = True)
If set to True, the scores are averaged across all folds, and the coefs and the C that corresponds to the best score is taken, and a final refit is done using these parameters.
Otherwise the coefs, intercepts and C that correspond to the best scores across folds are averaged.
l1_ratios list of float, (default = None)
The list of Elastic-Net mixing parameter, with 0 <= l1_ratio <= 1
Only used if penalty='elasticnet'.
A value of 0 is equivalent to using penalty='l2', while 1 is equivalent to using penalty='l1'.
For 0 < l1_ratio <1
, the penalty is a combination of L1 and L2.
Multiclass classification
Note: The multiclass and multilabel metrics also work for binary classification.
1. sklearn.metrics.precision_recall_curve
2. sklearn.metrics.roc_curve
3. sklearn.metrics.det_curve
- calculates the mean of the binary metrics, giving equal weight to each class.
- computes the average of binary metrics in which each class’s score is weighted by its presence in the true data sample.
- gives each sample-class pair an equal contribution to the overall metric (except as a result of sample-weight). (preferred in multilabel settings, including multiclass classification where a majority class is to be ignored.)
- calculates the metric over the true and predicted classes for each sample in the evaluation data, and returns their (sample_weight-weighted) average.
will return an array with the score for each class.
1. sklearn.metrics.confusion_matrix
2. sklearn.metrics.balanced_accuracy_score
3. sklearn.metrics.cohen_kappa_score
4. sklearn.metrics.hinge_loss
5. sklearn.metrics.matthews_corrcoef
6. sklearn.metrics.roc_auc_score
7. sklearn.metrics.top_k_accuracy_score
1. sklearn.metrics.accuracy_score
2. sklearn.metrics.multilabel_confusion_matrix
from sklearn.metrics import multilabel_confusion_matrix
y_true = ["cat", "ant", "cat", "cat", "ant", "bird"]
y_pred = ["ant", "ant", "cat", "cat", "ant", "cat"]
multilabel_confusion_matrix(y_true, y_pred,labels=["ant", "bird", "cat"])
3. sklearn.metrics.classification_report
4. sklearn.metrics.zero_one_loss
parameter is True, this function returns the fraction of misclassifications (float), else it returns the number of misclassifications (int).
from sklearn.metrics import classification_report
y_true = [0, 1, 2, 2, 0]
y_pred = [0, 0, 2, 1, 0]
target_names = ['class 0', 'class 1', 'class 2']
print(classification_report(y_true, y_pred, target_names=target_names))
4. sklearn.metrics.hamming_loss
5. sklearn.metrics.log_loss
method.6. sklearn.metrics.jaccard_score
.7. Precision, recall and F-measures
Note: Best value is 1 and the worst value is 0 for these scores.
sklearn.metrics.precision_score |
computes precision which is intuitively the ability of the classifier not to label as positive a sample that is negative. |
sklearn.metrics.recall_score |
computes recall which is intuitively the ability of the classifier to find all the positive samples. |
sklearn.metrics.f1_score |
computes harmonic mean of precision and recall |
sklearn.metrics.fbeta_score |
computes weighted harmonic mean of precision and recall |
sklearn.metrics.average_precision_score |
computes the average precision from prediction scores (this score does not supports multiclass) |
8. sklearn.metrics.precision_recall_fscore_support
