Classification functions in sci-kit learn

Dr. Ashish Tendulkar

Machine Learning Techniques

IIT Madras

  1. Ridge classification
  2. Logistic regression
  3. SGD classifier
  4. Perceptron
  5. Extending linear models with polynomial features
  6. Nearest neighbor classifier
  7. NB classifier
  8. Multi-label and multi-class classification
  9. Probability calibration
  10. Model selection for classification
    • CV
    • HPT
  11. Classification metrics

Contents

  • RidgeClassifier is a classifier variant of the Ridge regressor.
  • Binary classification:
    • classifier first converts binary targets to {-1, 1} and then treats the problem as a regression task, optimizing the objective of regressor:
      • minimize a penalized residual sum of squares
        • minwXwy22+αw22\min\limits_\mathbf w ||\mathbf{Xw}-\mathbf{y}||_2^2 + \alpha ||\mathbf w||_2^2
    • predicted class corresponds to the sign of the regressor’s prediction
  • Multiclass classification:
    • treated as multi-output regression
    • predicted class corresponds to the output with the highest value

Ridge classification

Class: sklearn.linear_model.RidgeClassifier  

Some Parameters:

  • alpha - Regularization strength; must be a positive float (default = 1.0).
  • solver - used in the computational routines: (default = 'auto')
    • auto’ - chooses the solver automatically based on the type of data
    • svd’ - uses Singular Valur Decomposition
    • cholesky’ - uses scipy.linalg.solve   function to obtain a closed-form solution
    • lsqr’ - uses the dedicated regularized least-squares routine scipy.sparse.linalg.lsqr. (fastest and uses an iterative procedure)
    • sparse_cg’ - uses the conjugate gradient solver of scipy.sparse.linalg.cg.
    • 'sag’ - uses a Stochastic Average Gradient descent iterstive procedure
    • saga’ - unbiased and more flexible version of 'sag'

Ridge classification

decision_function(X)

Predict confidence scores for samples.

densify()

Convert coefficient matrix to dense array format.

fit(X, y[, coef_init, intercept_init, …])

Fit linear model with Stochastic Gradient Descent.

get_params([deep])

Get parameters for this estimator.

partial_fit(X, y[, classes, sample_weight])

Perform one epoch of stochastic gradient descent on given samples.

predict(X)

Predict class labels for samples in X.

predict_log_proba(X)

Log of probability estimates.

predict_proba(X)

Probability estimates.

score(X, y[, sample_weight])

Return the mean accuracy on the given test data and labels.

set_params(**params)

Set the parameters of this estimator.

sparsify()

Convert coefficient matrix to sparse format.

Some common methods for all classifiers

  • LogisticRegression is a linear model for classification rather than regression.
    • logit regression, maximum-entropy classification (MaxEnt) or the log-linear classifier.
  • This implementation can fit
    • binary
    • multiclass - One-vs-Rest
    • multinomial logistic regression with optional 1,2\ell_1,\ell_2 or Elastic-Net regularization

Logistic regression

Class: sklearn.linear_model.LogisticRegression 

Some Parameters:

  • penalty (default = 'l2')
    • 'none' - no penalty is added

    • 'l2' - add a L2 penalty term and it is the default choice

    • 'l1' - add a L1 penalty term

    • 'elasticnet' - both L1 and L2 penalty terms are added

  • solver (default = 'lbfgs')

    • 'liblinear' - uses a coordinate descent (CD) algorithm

    • 'lbfgs' - an optimizer in the family of quasi-Newton methods.

    • 'newton-cg', 'sag', 'saga'

Logistic regression

  • For small datasets, ‘liblinear’ is a good choice, whereas ‘sag’ and ‘saga’ are faster for large ones.
  • For multiclass problems, only ‘newton-cg’, ‘sag’, ‘saga’ and ‘lbfgs’ handle multinomial loss.
  • ‘liblinear’ is limited to one-versus-rest schemes.
  • max_iter (default = 100)
    • maximum number of iterations taken for the solvers to converge
  • multi_class (default = ‘auto’)
    • ovr’ - a binary problem is fit for each label (uses the one-vs-rest)
    • multinomial’ -  uses the cross-entropy loss
    • auto’ - selects ‘ovr’ if the data is binary, or if solver=’liblinear’, and otherwise selects ‘multinomial’.

Logistic regression

  • Stochastic Gradient Descent (SGD) is a simple yet very efficient approach to fitting linear classifiers and regressors under convex loss functions
  • It is an optimization technique and does not correspond to a specific family of machine learning models.
  • It is only a way to train a model.
  • easily scale to problems with more than 10510^5 training examples and more than 10510^5 features
  • SGD has to be fitted with two arrays:
    • training samples - X, an array of shape (n_samples, n_features)
    • target values (class labels) - y, an array of shape (n_samples,)

SGDClassifier(loss='log') 

 LogisticRegression(solver='sgd')

SGD classifier

SGDClassifier(loss='hinge')

Linear Support vector machine

SGD classifier

  • Class: sklearn.linear_model.SGDClassifier 

    • This estimator implements regularized linear models with SGD.

    • The gradient of the loss is estimated each sample at a time and the model is updated along the way with a decreasing learning rate.

  • Some parameters

    • ​​penalty - 'l2’, ‘l1’, ‘elasticnet’ (default = 'l2') ​​

    • loss (default = 'hinge')

      • 'hinge' - (soft-margin) linear Support Vector Machine,

      • 'modified_huber' - smoothed hinge loss brings tolerance to outliers as well as probability estimates

      • 'log' - logistic regression

      • 'squared_hinge' - like hinge but is quadratically penalized

      • 'perceptron' - linear loss used by the perceptron algorithm

      • regression losses - ‘squared_error’, ‘huber’, ‘epsilon_insensitive’, or ‘squared_epsilon_insensitive’

SGD classifier

  • alpha (default = 0.0001)

    • constant that multiplies the regularization term.

  • fit_intercept (default = True)

    • If False, the data is assumed to be already centered.

  • max_iter (default = 1000)

    • maximum number of passes over the training data (aka epochs).

  • learning_rate (default = ’optimal’)

    • constant’: eta = eta0  (default eta0=0.0, initial learning rate)

    • optimal’: eta = 1.0 / (alpha * (t + t0))   where t0 is chosen by a heuristic proposed by Leon Bottou.

    • invscaling’: eta = eta0 / pow(t, power_t) 

    • adaptive’: eta = eta0 , as long as the training keeps decreasing. Each time n_iter_no_change consecutive epochs fail to decrease the training loss by tol or fail to increase validation score by tol if early_stopping is True, the current learning rate is divided by 5.

SGD classifier

  • tol (default = 1e-3)

    • stopping criterion.

    • If it is not None, training will stop when (loss > best_loss - tol) for n_iter_no_change consecutive epochs.

    • Convergence is checked against the training loss or the validation loss depending on the early_stopping parameter.

  • early_stopping (default = False)

    • to terminate training when validation score is not improving.

    • If set to True, it will automatically set aside a stratified fraction of training data as validation and terminate training when validation score returned by the score method is not improving by at least tol for n_iter_no_change consecutive epochs

SGD classifier

  • validation_fraction (default = 0.1)

    • proportion of training data to set aside as validation set for early stopping . Must be between 0 and 1.

    • Only used if early_stopping  is True.

  • n_iter_no_change (default = 5)

    • Number of iterations with no improvement to wait before stopping fitting.

    • Convergence is checked against the training loss or the validation loss depending on the early_stopping  parameter.

  • class_weight (default = None) {class_label: weight} or “balanced”,

    • The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y)).

    • Preset for the class_weight fit parameter.​ Weights associated with classes. If not given, all classes are supposed to have weight one.

It is a simple classification algorithm suitable for large-scale learning.

Class: sklearn.linear_model.Perceptron 

Some Parameters:

  • penalty - 'l2’, ‘l1’, ‘elasticnet’ (default = 'l2') ​​

  • alpha - (default = 0.0001)

  • l1_ratio - (default = 0.15)

  • fit_intercept - (default = True)

  • max_iter - (default = 1000)

  • tol - (default = 1e-3)

  • eta0 - (default = 1)

  • early_stopping - (default = False)

  • validation_fraction - (default = 0.1)

  • n_iter_no_change - (default = 5)

     

  • Type of instance-based learning or non-generalizing learning
    • it does not attempt to construct a general internal model, but simply stores instances of the training data.
  • Classification is computed from a simple majority vote of the nearest neighbors of each point.
  • scikit-learn  implements two different nearest neighbors classifiers.

Nearest neighbor classifier

KNeighborsClassifier RadiusNeighborsClassifier
implements learning based on the k nearest neighbors of each query point, where k is an integer value specified by the user. implements learning based on the number of neighbors within a fixed radius r of each training point, where r is a floating-point value specified by the user.
most commonly used technique
choice of the value k is highly data-dependent
used in cases where the data is not uniformly sampled
larger k suppresses the effects of noise, but makes the classification boundaries less distinct. user specifies a fixed radius r, such that points in sparser neighborhoods use fewer nearest neighbors for the classification

Class: sklearn.neighbors.KNeighborsClassifier 

Some Parameters

  • n_neighbors (default = 5)
    • Number of neighbors to use by default for k neighbors queries.
  • weights (used in prediction) (default = ’uniform')
    • uniform’ : All points in each neighborhood are weighted equally.
    • distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.
    • [callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.

Nearest neighbor classifier

  • algorithm (default = 'auto')
    • Algorithm used to compute the nearest neighbors:
      •  

  • leaf_size (default = 30)
    • Leaf size passed to BallTree or KDTree. 
  • p (default = 2)
    • Power parameter for the Minkowski metric.
    • When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2.
    • For arbitrary p, minkowski_distance (l_p) is used.

Nearest neighbor classifier

Class: sklearn.neighbors.RadiusNeighborsClassifier 

Some Parameters

  • radius (default = 1.0)
    • Range of parameter space to use by default for radius_neighbors queries

  • weights (‘uniform’, ‘distance’, [callable], default = ’uniform')

  • algorithm (‘ball_tree’, ‘kd_tree’, ‘brute’, ‘auto’, default = 'auto'

  • leaf_size (default = 30)

  • p (default = 2)

  • metric (default = ’minkowski’)

    • Distance metric to use for the tree.

Nearest neighbor classifier

  • We shall discuss two modules: sklearn.multiclass and sklearn.multioutput.
  • Multiclass classification is a classification task with more than two classes. Each sample can only be labeled as one class.
  • Multilabel classification is a classification task labeling each sample with m labels from n_classes possible classes.

Multiclass and multilabel classification

Multiclass classification

(sklearn.multiclass)

Multilabel classification

(sklearn.multioutput)

problem types

meta-estimators

OneVsOneClassifier

OneVsRestClassifier

OutputCodeClassifier

MultiOutputClassifier

ClassifierChain

sklearn.utils.multiclass.type_of_target 

  • determines the type of data indicated by the target.

  • Parameters : y (array-like), Returns : target_type (string)

Multiclass classification - Target format

target_type  y
'continuous' array-like of floats that are not all integers and is 1d or a column vector.
'continuous-multioutput' 2d array of floats that are not all integers, and both dimensions are of size > 1.
‘binary’ contains <= 2 discrete values and is 1d or a column vector.
‘multiclass’ contains more than two discrete values, is not a sequence of sequences, and is 1d or a column vector.
‘multiclass-multioutput’ 2d array that contains more than two discrete values, is not a sequence of sequences, and both dimensions are of size > 1.
‘unknown’ array-like but none of the above, such as a 3d array, sequence of sequences, or an array of non-sequence objects.

Examples

>>> from sklearn.utils.multiclass import type_of_target
>>> import numpy as np
>>> type_of_target([0.1, 0.6])
'continuous'
>>> type_of_target([1, -1, -1, 1])
'binary'
>>> type_of_target(['a', 'b', 'a'])
'binary'
>>> type_of_target([1.0, 2.0])
'binary'
>>> type_of_target([1, 0, 2])
'multiclass'
>>> type_of_target([1.0, 0.0, 3.0])
'multiclass'
>>> type_of_target(['a', 'b', 'c'])
'multiclass'
>>> type_of_target(np.array([[1, 2], [3, 1]]))
'multiclass-multioutput'
>>> type_of_target(np.array([[1.5, 2.0], [3.0, 1.6]]))
'continuous-multioutput'

continuous-multioutput

continuous

binary

multiclass

multiclass-multioutput

  • OneVsRestClassifier
    • Fitting one classifier per class. For each classifier, the class is fitted against all the other classes.
    • classifiers needed = nclassesn_{classes}
    • advantage : interpretability.
    • most commonly used strategy and is a fair default choice.
  • OneVsOneClassifier 

    • Constructs one classifier per pair of classes.

    • At prediction time, the class which received the most votes is selected. In the event of a tie, it selects the class with the highest aggregate classification confidence by summing over the pair-wise classification confidence levels computed by the underlying binary classifiers.

    • classifiers needed = nclasses×(nclasses1)2\dfrac{n_{classes}\times(n_{classes} - 1)}{2}

    • slower than one-vs-the-rest, due to its O(nclasses2)O(n_{classes}^2) complexity.

    • advantage : for kernel algorithms which don’t scale well

Multiclass classification

  •  OutputCodeClassifier
    • Error-Correcting Output Code-based strategy
      • each class is represented in a Euclidean space, where each dimension can only be 0 or 1 (binary code)
    • the code_size attribute allows the user to control the number of classifiers which will be used. It is a percentage of the total number of classes.

Multiclass classification

Multilabel classification - Target format

target_type  y
‘multilabel-indicator’ label indicator matrix, an array of two dimensions with at least two columns, and at most 2 unique values.
type_of_target(np.array([[0, 1], [1, 1]]))
'multilabel-indicator'
>>> type_of_target([[1, 2]])
'multilabel-indicator'

multilabel-indicator

Multilabel classification

MultiOutputClassifier ClassifierChain
Strategy consists of fitting one classifier per target. Way of combining a number of binary classifiers into a single multi-label model that is capable of exploiting correlations among targets.
Allows multiple target variable classifications.
Able to estimate a series of target functions that are trained on a single predictor matrix to predict a series of responses.
For a multi-label classification problem with N classes, N binary classifiers are assigned an integer between 0 and N-1.
These integers define the order of models in the chain.
  • The calibration module allows you to better calibrate the probabilities of a given model, or to add support for probability prediction.
  • obtain a probability of the respective label

Probability calibration

Calibration curves

compare how well the probabilistic predictions of a binary classifier are calibrated.

plots the true frequency of the positive label against its predicted probability, for binned predictions.

x axis : average predicted probability in each bin

y axis : fraction of positives, i.e. the proportion of samples whose class is the positive class (in each bin).

Probability calibration - Example

  • Calibration curve plot is created with CalibrationDisplay.from_estimators, which uses calibration_curve to calculate the per bin average predicted probabilities and fraction of positives.

Probability calibration - Example

LogisticRegression returns well calibrated predictions by default as it directly optimizes Log loss.

GaussianNB tends to push probabilities to 0 or 1.

RandomForestClassifier peaks at approximately 0.2 and 0.9 probability, while probabilities close to 0 or 1 are very rare.

LinearSVC focus on difficult to classify samples that are close to the decision boundary (the support vectors).

  • Fitting a regressor (called a calibrator) that maps the output of the classifier (as given by decision_function or predict_proba) to a calibrated probability in [0, 1].
  • CalibratedClassifierCV class is used to calibrate a classifier.
    • estimates the parameters of a classifier and subsequently calibrates a classifier.
    • ensemble = True 

      • for each cv split it fits a copy of the base estimator to the training subset, and calibrates it using the testing subset.
      • For prediction, predicted probabilities are averaged across these individual calibrated classifiers.
    • ensemble = False 

      • cv is used to obtain unbiased predictions, which are then used for calibration.
      • For prediction, the base estimator, trained using all the data, is used.

Calibrating a classifier

  • Cross-validation iterators with stratification based on class labels:
    • There is a chance for a large imbalance in the distribution of the target classes for some classification problems.
    • Recommendation:
    • These two methods ensure that, in each train and validation fold, relative class frequencies are approximately preserved.​

 

  • Cross-validation estimator:
    • Cross-validation estimators are named EstimatorCV and tend to be roughly equivalent to GridSearchCV(Estimator(), ...).
    • Example of cross-validation estimator is LogisticRegressionCV.

Model selection for classification

  • This cross-validation object returns stratified folds by preserving the percentage of samples for each class.

Class: sklearn.model_selection.StratifiedKFold  

Some Parameters:

  • n_splits (default = 5)

    • Number of folds. Must be at least 2.

  • shuffle (default = False)

    • to shuffle or not to shuffle each class’s samples before splitting into batches.

    • samples within each split will not be shuffled.

  • random_state RandomState instance or None, (default=None)

    • set random_state   when shuffle = True   because it affects the ordering of the indices, which controls the randomness of each fold for each class.

  • RepeatedStratifiedKFold: Repeats Stratified K-Fold n times.

1. Stratified k-fold

  • Splits preserve the same percentage for each target class as in the complete set but ​do not guarantee that all folds will be different.

Class: sklearn.model_selection.StratifiedShuffleSplit  

Some Parameters:

  • n_splits (default = 10)
  • random_state (RandomState instance or None, default = None)
  • test_size (default = None)
    • float value proportion of the test dataset split (between 0.0 and 1.0)
    • int value - represents the absolute number of test samples
    • None - complement of the train size. If train_size = None, value = 0.1
  • train_size (default = None)
    • float value - proportion of the train dataset split (between 0.0 and 1.0)
    • int value - absolute number of train samples.
    • None - complement of the test size.

2. StratifiedShuffleSplit

from sklearn.model_selection import StratifiedKFold, StratifiedShuffleSplit
import numpy as np
X, y = np.random.randint(1,50,50), np.hstack(([0] * 45, [1] * 5))
print('X',X)
print('y',y)
skf = StratifiedKFold(n_splits=3)
print('StratifiedKFold')
count = 1 
for train, test in skf.split(X, y):
  print('Split', count)
  print('train -  {}   |   test -  {}'.format(np.bincount(y[train]), np.bincount(y[test])))
  print('train',X[train])
  print('test',X[test])
  count+=1
print('StratifiedShuffleSplit')
sss = StratifiedShuffleSplit(n_splits=3, test_size=0.2, random_state=0)
count = 1 
for train_index, test_index in sss.split(X, y):
  print('Split', count)
  print('train -  {}   |   test -  {}'.format(np.bincount(y[train_index]), np.bincount(y[test_index])))
  print('train',X[train_index])
  print('test',X[test_index])
  count+=1

Example to compare StratifiedKFold and StratifiedShuffleSplit

Output:

Comparison

3. Logistic Regression CV

  • Logistic regression with tuning the hyperparameter Cs  values and l1_ratios values.

Class: sklearn.linear_model.LogisticRegressionCV    

Some Parameters:

  • 'Cs' (default = 10)

    • Each of the values in Cs describes the inverse of regularization strength.

    • If int, then a grid of values = logarithmic scale between 1e41e^{-4} & 1e41e^4.

  • 'cv' (default = None)

    • The default cross-validation generator used is Stratified K-Folds.

    • If an integer is provided, then it is the number of folds used. 

  • scoring (default = None)

    • A string or scorer(estimator, X, y) . (default scoring option used is ‘accuracy’).

3. Logistic Regression CV

  • penalty (‘l1’, ‘l2’, ‘elasticnet’, default=‘l2’)

  • refit (default = True)

    • If set to True, the scores are averaged across all folds, and the coefs and the C that corresponds to the best score is taken, and a final refit is done using these parameters.

    • Otherwise the coefs, intercepts and C that correspond to the best scores across folds are averaged.

  • l1_ratios list of float, (default = None)

    • The list of Elastic-Net mixing parameter, with 0 <= l1_ratio <= 1  .

    • Only used if penalty='elasticnet'.

    • A value of 0 is equivalent to using penalty='l2', while 1 is equivalent to using penalty='l1'.

    • For 0 < l1_ratio <1  , the penalty is a combination of L1 and L2.

  • sklearn.metrics module implements loss, score, and utility functions to measure classification performance.

Classification metrics

Note: The multiclass and multilabel metrics also work for binary classification.

1. sklearn.metrics.precision_recall_curve 

  • The precision-recall curve shows the tradeoff between precision and recall for different threshold.

1. Binary Classification metrics

2. sklearn.metrics.roc_curve 

  • Receiver Operating Characteristic (ROC) curves typically feature true positive rate on the Y axis, and false positive rate on the X axis.
  • The top left corner of the plot is the “ideal” point - a false positive rate of zero, and a true positive rate of one.

1. Binary Classification metrics

3. sklearn.metrics.det_curve 

  • A detection error tradeoff (DET) graph plots false reject rate vs. false accept rate. 
  • DET curves are a variation of ROC curves where False Negative Rate is plotted on the y-axis instead of True Positive Rate.

1. Binary Classification metrics

  • Data is treated as a collection of binary problems, one for each class.
  • Binary metric calculations across the set of classes are averaged. Done through the average  parameter. 
    • "macro"  - calculates the mean of the binary metrics, giving equal weight to each class.

    • "weighted"  - computes the average of binary metrics in which each class’s score is weighted by its presence in the true data sample.

    • "micro"  - gives each sample-class pair an equal contribution to the overall metric (except as a result of sample-weight). (preferred in multilabel settings, including multiclass classification where a majority class is to be ignored.)

    • "samples"  - calculates the metric over the true and predicted classes for each sample in the evaluation data, and returns their (sample_weight-weighted) average.

    • "None"  will return an array with the score for each class.

From binary to multiclass and multilabel

1. sklearn.metrics.confusion_matrix 

  • This function evaluates classification accuracy by computing the confusion matrix with each row corresponding to the true class.
  • By definition, entry i,ji,j in a confusion matrix is the number of observations actually in group ii, but predicted to be in group jj.

2. Multiclass Classification metrics

2. sklearn.metrics.balanced_accuracy_score 

  • Balanced accuracy in binary and multiclass classification problems is defined as the average of recall obtained on each class.

​3. sklearn.metrics.cohen_kappa_score 

  • Cohen’s kappa is a statistic that measures inter-annotator agreement (a score that expresses the level of agreement between two annotators on a classification problem)

​4. sklearn.metrics.hinge_loss 

  • Hinge_loss function computes the average distance between the model and the data using hinge loss, a one-sided metric that considers only prediction errors.

5. sklearn.metrics.matthews_corrcoef 

  • It is a correlation coefficient value between -1 and +1.
  • +1 represents a perfect prediction, 0 an average random prediction and -1 an inverse prediction.

2. Multiclass Classification metrics

6. sklearn.metrics.roc_auc_score 

  • The toc_auc_score function used in multi-class classification supports two strategies:
    1. one-vs-one algorithm computes the average of the pairwise ROC AUC scores
    2. one-vs-rest algorithm computes the average of the ROC AUC scores for each class against all other classes.

​7. sklearn.metrics.top_k_accuracy_score 

  • Computes the number of times where the correct label is among the top k labels predicted (ranked by predicted scores).

2. Multiclass Classification metrics

1. sklearn.metrics.accuracy_score 

  • In multilabel classification, this function computes subset accuracy: the set of labels predicted for a sample must exactly match the corresponding set of labels in y_true.

2. sklearn.metrics.multilabel_confusion_matrix 

  • Calculates class-wise or sample-wise multilabel confusion matrices.
  • When calculating class-wise multilabel confusion matrix CC, the count of true negatives for class ii is Ci,0,0C_{i,0,0}, false negatives is Ci,1,0C_{i,1,0}, true positives is Ci,1,1C_{i,1,1} and false positives is Ci,0,1C_{i,0,1}.
  • Example:

3. Multilabel Classification metrics

from sklearn.metrics import multilabel_confusion_matrix
y_true = ["cat", "ant", "cat", "cat", "ant", "bird"]
y_pred = ["ant", "ant", "cat", "cat", "ant", "cat"]
multilabel_confusion_matrix(y_true, y_pred,labels=["ant", "bird", "cat"])

3. sklearn.metrics.classification_report 

  • This function builds a text report showing the main classification metrics.
  • Example:

 

 

 

 

 

 

 

4. sklearn.metrics.zero_one_loss 

  • If normalize  parameter is True, this function returns the fraction of misclassifications (float), else it returns the number of misclassifications (int).

3. Multilabel Classification metrics

from sklearn.metrics import classification_report
y_true = [0, 1, 2, 2, 0]
y_pred = [0, 0, 2, 1, 0]
target_names = ['class 0', 'class 1', 'class 2']
print(classification_report(y_true, y_pred, target_names=target_names))

4. sklearn.metrics.hamming_loss 

  • Hamming loss is the fraction of labels that are incorrectly predicted.
  • This function computes the average Hamming loss or Hamming distance between two sets of samples.

5. sklearn.metrics.log_loss 

  • Log loss, also called logistic regression loss or cross-entropy loss, is defined on probability estimates.
  • This function computes log loss given a list of ground-truth labels and a probability matrix, as returned by an estimator’s predict_proba  method.

6. sklearn.metrics.jaccard_score 

  • The Jaccard index or Jaccard similarity coefficient, defined as the size of the intersection divided by the size of the union of two label sets, is used to compare set of predicted labels for a sample to the corresponding set of labels in y_true .

3. Multilabel Classification metrics

7. Precision, recall and F-measures

 

 

 

 

 

 

 

 

 

Note: Best value is 1 and the worst value is 0 for these scores.

3. Multilabel Classification metrics

sklearn.metrics.precision_score computes precision which is intuitively the ability of the classifier not to label as positive a sample that is negative.
sklearn.metrics.recall_score  computes recall which is intuitively the ability of the classifier to find all the positive samples.
sklearn.metrics.f1_score  computes harmonic mean of precision and recall
sklearn.metrics.fbeta_score  computes weighted harmonic mean of precision and recall
sklearn.metrics.average_precision_score  computes the average precision from prediction scores (this score does not supports multiclass)

8. sklearn.metrics.precision_recall_fscore_support 

  • Compute precision, recall, F-measure and support for each class.

Copy of Classification functions in sci-kit learn

By Amrutha

Copy of Classification functions in sci-kit learn

  • 260