Classification functions in sci-kit learn

Dr. Ashish Tendulkar

Machine Learning Techniques

IIT Madras

Ridge classification
Logistic regression
SGD classifier
Perceptron
Extending linear models with polynomial features
Nearest neighbor classifier
NB classifier
Multi-label and multi-class classification
Probability calibration
Model selection for classification
- CV
- HPT
Classification metrics

RidgeClassifier is a classifier variant of the Ridge regressor.
Binary classification:
- classifier first converts binary targets to {-1, 1} and then treats the problem as a regression task, optimizing the objective of regressor:
  - minimize a penalized residual sum of squares
    - $\min\limits_\mathbf w ||\mathbf{Xw}-\mathbf{y}||_2^2 + \alpha ||\mathbf w||_2^2$
- predicted class corresponds to the sign of the regressor’s prediction
Multiclass classification:
- treated as multi-output regression
- predicted class corresponds to the output with the highest value

Ridge classification

Class: sklearn.linear_model.RidgeClassifier

Some Parameters:

alpha - Regularization strength; must be a positive float (default = 1.0).

solver - used in the computational routines: (default = 'auto')
- ‘auto’ - chooses the solver automatically based on the type of data
- ‘svd’ - uses Singular Valur Decomposition
- ‘cholesky’ - uses scipy.linalg.solve function to obtain a closed-form solution
- ‘lsqr’ - uses the dedicated regularized least-squares routine scipy.sparse.linalg.lsqr. (fastest and uses an iterative procedure)
- ‘sparse_cg’ - uses the conjugate gradient solver of scipy.sparse.linalg.cg.
- 'sag’ - uses a Stochastic Average Gradient descent iterstive procedure
- ‘saga’ - unbiased and more flexible version of 'sag'

Ridge classification

decision_function(X)	Predict confidence scores for samples.
densify()	Convert coefficient matrix to dense array format.
fit(X, y[, coef_init, intercept_init, …])	Fit linear model with Stochastic Gradient Descent.
get_params([deep])	Get parameters for this estimator.
partial_fit(X, y[, classes, sample_weight])	Perform one epoch of stochastic gradient descent on given samples.
predict(X)	Predict class labels for samples in X.
predict_log_proba(X)	Log of probability estimates.
predict_proba(X)	Probability estimates.
score(X, y[, sample_weight])	Return the mean accuracy on the given test data and labels.
set_params(**params)	Set the parameters of this estimator.
sparsify()	Convert coefficient matrix to sparse format.

Some common methods for all classifiers

LogisticRegression is a linear model for classification rather than regression.
- logit regression, maximum-entropy classification (MaxEnt) or the log-linear classifier.
This implementation can fit
- binary
- multiclass - One-vs-Rest
- multinomial logistic regression with optional $\ell_1,\ell_2$ or Elastic-Net regularization

Logistic regression

Class: sklearn.linear_model.LogisticRegression

Some Parameters:

penalty (default = 'l2')
- 'none' - no penalty is added
- 'l2' - add a L2 penalty term and it is the default choice
- 'l1' - add a L1 penalty term
- 'elasticnet' - both L1 and L2 penalty terms are added
solver (default = 'lbfgs')
- 'liblinear' - uses a coordinate descent (CD) algorithm
- 'lbfgs' - an optimizer in the family of quasi-Newton methods.
- 'newton-cg', 'sag', 'saga'

Logistic regression

For small datasets, ‘liblinear’ is a good choice, whereas ‘sag’ and ‘saga’ are faster for large ones.
For multiclass problems, only ‘newton-cg’, ‘sag’, ‘saga’ and ‘lbfgs’ handle multinomial loss.
‘liblinear’ is limited to one-versus-rest schemes.
max_iter (default = 100)
- maximum number of iterations taken for the solvers to converge
multi_class (default = ‘auto’)
- ‘ovr’ - a binary problem is fit for each label (uses the one-vs-rest)
- ‘multinomial’ - uses the cross-entropy loss
- ‘auto’ - selects ‘ovr’ if the data is binary, or if solver=’liblinear’, and otherwise selects ‘multinomial’.

Logistic regression

Stochastic Gradient Descent (SGD) is a simple yet very efficient approach to fitting linear classifiers and regressors under convex loss functions
It is an optimization technique and does not correspond to a specific family of machine learning models.
It is only a way to train a model.
easily scale to problems with more than $10^5$ training examples and more than $10^5$ features
SGD has to be fitted with two arrays:
- training samples - X, an array of shape (n_samples, n_features)
- target values (class labels) - y, an array of shape (n_samples,)

SGDClassifier(loss='log')

LogisticRegression(solver='sgd')

SGD classifier

SGDClassifier(loss='hinge')

Linear Support vector machine

SGD classifier

Class: sklearn.linear_model.SGDClassifier
- This estimator implements regularized linear models with SGD.
- The gradient of the loss is estimated each sample at a time and the model is updated along the way with a decreasing learning rate.
Some parameters
- penalty - 'l2’, ‘l1’, ‘elasticnet’ (default = 'l2') 
- loss (default = 'hinge')
  - 'hinge' - (soft-margin) linear Support Vector Machine,
  - 'modified_huber' - smoothed hinge loss brings tolerance to outliers as well as probability estimates
  - 'log' - logistic regression
  - 'squared_hinge' - like hinge but is quadratically penalized
  - 'perceptron' - linear loss used by the perceptron algorithm
  - regression losses - ‘squared_error’, ‘huber’, ‘epsilon_insensitive’, or ‘squared_epsilon_insensitive’

SGD classifier

alpha (default = 0.0001)
- constant that multiplies the regularization term.
fit_intercept (default = True)
- If False, the data is assumed to be already centered.
max_iter (default = 1000)
- maximum number of passes over the training data (aka epochs).
learning_rate (default = ’optimal’)
- ‘constant’: eta = eta0 (default eta0=0.0, initial learning rate)
- ‘optimal’: eta = 1.0 / (alpha * (t + t0)) where t0 is chosen by a heuristic proposed by Leon Bottou.
- ‘invscaling’: eta = eta0 / pow(t, power_t)
- ‘adaptive’: eta = eta0 , as long as the training keeps decreasing. Each time n_iter_no_change consecutive epochs fail to decrease the training loss by tol or fail to increase validation score by tol if early_stopping is True, the current learning rate is divided by 5.

SGD classifier

tol (default = 1e-3)
- stopping criterion.
- If it is not None, training will stop when (loss > best_loss - tol) for n_iter_no_change consecutive epochs.
- Convergence is checked against the training loss or the validation loss depending on the early_stopping parameter.
early_stopping (default = False)
- to terminate training when validation score is not improving.
- If set to True, it will automatically set aside a stratified fraction of training data as validation and terminate training when validation score returned by the score method is not improving by at least tol for n_iter_no_change consecutive epochs

SGD classifier

validation_fraction (default = 0.1)
- proportion of training data to set aside as validation set for early stopping . Must be between 0 and 1.
- Only used if early_stopping is True.
n_iter_no_change (default = 5)
- Number of iterations with no improvement to wait before stopping fitting.
- Convergence is checked against the training loss or the validation loss depending on the early_stopping parameter.
class_weight (default = None) {class_label: weight} or “balanced”,
- The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y)).
- Preset for the class_weight fit parameter. Weights associated with classes. If not given, all classes are supposed to have weight one.

It is a simple classification algorithm suitable for large-scale learning.

Class: sklearn.linear_model.Perceptron

Some Parameters:

penalty - 'l2’, ‘l1’, ‘elasticnet’ (default = 'l2') 
alpha - (default = 0.0001)
l1_ratio - (default = 0.15)
fit_intercept - (default = True)
max_iter - (default = 1000)
tol - (default = 1e-3)
eta0 - (default = 1)
early_stopping - (default = False)
validation_fraction - (default = 0.1)
n_iter_no_change - (default = 5)

Perceptron

Type of instance-based learning or non-generalizing learning
- it does not attempt to construct a general internal model, but simply stores instances of the training data.
Classification is computed from a simple majority vote of the nearest neighbors of each point.
scikit-learn implements two different nearest neighbors classifiers.

Nearest neighbor classifier

KNeighborsClassifier	RadiusNeighborsClassifier
implements learning based on the k nearest neighbors of each query point, where k is an integer value specified by the user.	implements learning based on the number of neighbors within a fixed radius r of each training point, where r is a floating-point value specified by the user.
most commonly used technique choice of the value k is highly data-dependent	used in cases where the data is not uniformly sampled
larger k suppresses the effects of noise, but makes the classification boundaries less distinct.	user specifies a fixed radius r, such that points in sparser neighborhoods use fewer nearest neighbors for the classification

Class: sklearn.neighbors.KNeighborsClassifier

Some Parameters

n_neighbors (default = 5)
- Number of neighbors to use by default for k neighbors queries.
weights (used in prediction) (default = ’uniform')
- ‘uniform’ : All points in each neighborhood are weighted equally.
- ‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.
- [callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.

Nearest neighbor classifier

algorithm (default = 'auto')
- Algorithm used to compute the nearest neighbors:
leaf_size (default = 30)
- Leaf size passed to BallTree or KDTree.
p (default = 2)
- Power parameter for the Minkowski metric.
- When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2.
- For arbitrary p, minkowski_distance (l_p) is used.

Nearest neighbor classifier

Class: sklearn.neighbors.RadiusNeighborsClassifier

Some Parameters

radius (default = 1.0)
- Range of parameter space to use by default for radius_neighbors queries
weights (‘uniform’, ‘distance’, [callable], default = ’uniform')
algorithm (‘ball_tree’, ‘kd_tree’, ‘brute’, ‘auto’, default = 'auto'
leaf_size (default = 30)
p (default = 2)
metric (default = ’minkowski’)
- Distance metric to use for the tree.

Nearest neighbor classifier

We shall discuss two modules: sklearn.multiclass and sklearn.multioutput.
Multiclass classification is a classification task with more than two classes. Each sample can only be labeled as one class.
Multilabel classification is a classification task labeling each sample with m labels from n_classes possible classes.

Multiclass and multilabel classification

Multiclass classification

(sklearn.multiclass)

Multilabel classification

(sklearn.multioutput)

problem types

meta-estimators

OneVsOneClassifier

OneVsRestClassifier

OutputCodeClassifier

MultiOutputClassifier

ClassifierChain

sklearn.utils.multiclass.type_of_target

determines the type of data indicated by the target.
Parameters : y (array-like), Returns : target_type (string)

Multiclass classification - Target format

target_type	y
'continuous'	array-like of floats that are not all integers and is 1d or a column vector.
'continuous-multioutput'	2d array of floats that are not all integers, and both dimensions are of size > 1.
‘binary’	contains <= 2 discrete values and is 1d or a column vector.
‘multiclass’	contains more than two discrete values, is not a sequence of sequences, and is 1d or a column vector.
‘multiclass-multioutput’	2d array that contains more than two discrete values, is not a sequence of sequences, and both dimensions are of size > 1.
‘unknown’	array-like but none of the above, such as a 3d array, sequence of sequences, or an array of non-sequence objects.

Examples

>>> from sklearn.utils.multiclass import type_of_target
>>> import numpy as np
>>> type_of_target([0.1, 0.6])
'continuous'

>>> type_of_target([1, -1, -1, 1])
'binary'
>>> type_of_target(['a', 'b', 'a'])
'binary'
>>> type_of_target([1.0, 2.0])
'binary'

>>> type_of_target([1, 0, 2])
'multiclass'
>>> type_of_target([1.0, 0.0, 3.0])
'multiclass'
>>> type_of_target(['a', 'b', 'c'])
'multiclass'

>>> type_of_target(np.array([[1, 2], [3, 1]]))
'multiclass-multioutput'

>>> type_of_target(np.array([[1.5, 2.0], [3.0, 1.6]]))
'continuous-multioutput'

continuous-multioutput

continuous

binary

multiclass

multiclass-multioutput

OneVsRestClassifier
- Fitting one classifier per class. For each classifier, the class is fitted against all the other classes.
- classifiers needed = $n_{classes}$
- advantage : interpretability.
- most commonly used strategy and is a fair default choice.
OneVsOneClassifier
- Constructs one classifier per pair of classes.
- At prediction time, the class which received the most votes is selected. In the event of a tie, it selects the class with the highest aggregate classification confidence by summing over the pair-wise classification confidence levels computed by the underlying binary classifiers.
- classifiers needed = $\dfrac{n_{classes}\times(n_{classes} - 1)}{2}$
- slower than one-vs-the-rest, due to its $O(n_{classes}^2)$ complexity.
- advantage : for kernel algorithms which don’t scale well

Multiclass classification

OutputCodeClassifier
- Error-Correcting Output Code-based strategy
  - each class is represented in a Euclidean space, where each dimension can only be 0 or 1 (binary code)
- the code_size attribute allows the user to control the number of classifiers which will be used. It is a percentage of the total number of classes.

Multiclass classification

Multilabel classification - Target format

target_type	y
‘multilabel-indicator’	label indicator matrix, an array of two dimensions with at least two columns, and at most 2 unique values.

type_of_target(np.array([[0, 1], [1, 1]]))
'multilabel-indicator'
>>> type_of_target([[1, 2]])
'multilabel-indicator'

multilabel-indicator

Multilabel classification

MultiOutputClassifier	ClassifierChain
Strategy consists of fitting one classifier per target.	Way of combining a number of binary classifiers into a single multi-label model that is capable of exploiting correlations among targets.
Allows multiple target variable classifications. Able to estimate a series of target functions that are trained on a single predictor matrix to predict a series of responses.	For a multi-label classification problem with N classes, N binary classifiers are assigned an integer between 0 and N-1. These integers define the order of models in the chain.

The calibration module allows you to better calibrate the probabilities of a given model, or to add support for probability prediction.
obtain a probability of the respective label

Probability calibration

Calibration curves

compare how well the probabilistic predictions of a binary classifier are calibrated.

plots the true frequency of the positive label against its predicted probability, for binned predictions.

x axis : average predicted probability in each bin

y axis : fraction of positives, i.e. the proportion of samples whose class is the positive class (in each bin).

Probability calibration - Example

Image Source: https://scikit-learn.org/stable/modules/calibration.html

Calibration curve plot is created with CalibrationDisplay.from_estimators, which uses calibration_curve to calculate the per bin average predicted probabilities and fraction of positives.

Probability calibration - Example

LogisticRegression returns well calibrated predictions by default as it directly optimizes Log loss.

GaussianNB tends to push probabilities to 0 or 1.

RandomForestClassifier peaks at approximately 0.2 and 0.9 probability, while probabilities close to 0 or 1 are very rare.

LinearSVC focus on difficult to classify samples that are close to the decision boundary (the support vectors).

Fitting a regressor (called a calibrator) that maps the output of the classifier (as given by decision_function or predict_proba) to a calibrated probability in [0, 1].
CalibratedClassifierCV class is used to calibrate a classifier.
- estimates the parameters of a classifier and subsequently calibrates a classifier.
- ensemble = True
  - for each cv split it fits a copy of the base estimator to the training subset, and calibrates it using the testing subset.
  - For prediction, predicted probabilities are averaged across these individual calibrated classifiers.
- ensemble = False
  - cv is used to obtain unbiased predictions, which are then used for calibration.
  - For prediction, the base estimator, trained using all the data, is used.

Calibrating a classifier

Cross-validation iterators with stratification based on class labels:
- There is a chance for a large imbalance in the distribution of the target classes for some classification problems.
- Recommendation:
  - StratifiedKFold
  - StratifiedShuffleSplit
- These two methods ensure that, in each train and validation fold, relative class frequencies are approximately preserved.

Cross-validation estimator:
- Cross-validation estimators are named EstimatorCV and tend to be roughly equivalent to GridSearchCV(Estimator(), ...).
- Example of cross-validation estimator is LogisticRegressionCV.


    
        
            Model selection for classification

This cross-validation object returns stratified folds by preserving the percentage of samples for each class.

Class: sklearn.model_selection.StratifiedKFold

Some Parameters:

n_splits (default = 5)
- Number of folds. Must be at least 2.
shuffle (default = False)
- to shuffle or not to shuffle each class’s samples before splitting into batches.
- samples within each split will not be shuffled.
random_state RandomState instance or None, (default=None)
- set random_state when shuffle = True because it affects the ordering of the indices, which controls the randomness of each fold for each class.
RepeatedStratifiedKFold: Repeats Stratified K-Fold n times.

1. Stratified k-fold

Splits preserve the same percentage for each target class as in the complete set but do not guarantee that all folds will be different.

Class: sklearn.model_selection.StratifiedShuffleSplit

Some Parameters:

n_splits (default = 10)
random_state (RandomState instance or None, default = None)
test_size (default = None)
- float value - proportion of the test dataset split (between 0.0 and 1.0)
- int value - represents the absolute number of test samples
- None - complement of the train size. If train_size = None, value = 0.1
train_size (default = None)
- float value - proportion of the train dataset split (between 0.0 and 1.0)
- int value - absolute number of train samples.
- None - complement of the test size.

2. StratifiedShuffleSplit

from sklearn.model_selection import StratifiedKFold, StratifiedShuffleSplit
import numpy as np
X, y = np.random.randint(1,50,50), np.hstack(([0] * 45, [1] * 5))
print('X',X)
print('y',y)
skf = StratifiedKFold(n_splits=3)
print('StratifiedKFold')
count = 1 
for train, test in skf.split(X, y):
  print('Split', count)
  print('train -  {}   |   test -  {}'.format(np.bincount(y[train]), np.bincount(y[test])))
  print('train',X[train])
  print('test',X[test])
  count+=1
print('StratifiedShuffleSplit')
sss = StratifiedShuffleSplit(n_splits=3, test_size=0.2, random_state=0)
count = 1 
for train_index, test_index in sss.split(X, y):
  print('Split', count)
  print('train -  {}   |   test -  {}'.format(np.bincount(y[train_index]), np.bincount(y[test_index])))
  print('train',X[train_index])
  print('test',X[test_index])
  count+=1

Example to compare StratifiedKFold and StratifiedShuffleSplit

Output:

Comparison

3. Logistic Regression CV

Logistic regression with tuning the hyperparameter Cs values and l1_ratios values.

Class: sklearn.linear_model.LogisticRegressionCV

Some Parameters:

'Cs' (default = 10)
- Each of the values in Cs describes the inverse of regularization strength.
- If int, then a grid of values = logarithmic scale between $1e^{-4}$ & $1e^4$ .
'cv' (default = None)
- The default cross-validation generator used is Stratified K-Folds.
- If an integer is provided, then it is the number of folds used.
scoring (default = None)
- A string or scorer(estimator, X, y) . (default scoring option used is ‘accuracy’).

3. Logistic Regression CV

penalty (‘l1’, ‘l2’, ‘elasticnet’, default=‘l2’)
refit (default = True)
- If set to True, the scores are averaged across all folds, and the coefs and the C that corresponds to the best score is taken, and a final refit is done using these parameters.
- Otherwise the coefs, intercepts and C that correspond to the best scores across folds are averaged.
l1_ratios list of float, (default = None)
- The list of Elastic-Net mixing parameter, with 0 <= l1_ratio <= 1 .
- Only used if penalty='elasticnet'.
- A value of 0 is equivalent to using penalty='l2', while 1 is equivalent to using penalty='l1'.
- For 0 < l1_ratio <1 , the penalty is a combination of L1 and L2.

sklearn.metrics module implements loss, score, and utility functions to measure classification performance.

Classification metrics

Binary classification

precision_recall_curve

roc_curve

det_curve

Multiclass classification

balanced_accuracy_score

average_precision_score

Multilabel classification

accuracy_score

classification_report

multilabel_confusion_matrix

precision_recall_fscore_support

recall_score

precision_score

zero_one_loss

Note: The multiclass and multilabel metrics also work for binary classification.

1. sklearn.metrics.precision_recall_curve

The precision-recall curve shows the tradeoff between precision and recall for different threshold.

1. Binary Classification metrics

2. sklearn.metrics.roc_curve

Receiver Operating Characteristic (ROC) curves typically feature true positive rate on the Y axis, and false positive rate on the X axis.
The top left corner of the plot is the “ideal” point - a false positive rate of zero, and a true positive rate of one.

1. Binary Classification metrics

3. sklearn.metrics.det_curve

A detection error tradeoff (DET) graph plots false reject rate vs. false accept rate.
DET curves are a variation of ROC curves where False Negative Rate is plotted on the y-axis instead of True Positive Rate.

1. Binary Classification metrics

Data is treated as a collection of binary problems, one for each class.
Binary metric calculations across the set of classes are averaged. Done through the average parameter.
- "macro" - calculates the mean of the binary metrics, giving equal weight to each class.
- "weighted" - computes the average of binary metrics in which each class’s score is weighted by its presence in the true data sample.
- "micro" - gives each sample-class pair an equal contribution to the overall metric (except as a result of sample-weight). (preferred in multilabel settings, including multiclass classification where a majority class is to be ignored.)
- "samples" - calculates the metric over the true and predicted classes for each sample in the evaluation data, and returns their (sample_weight-weighted) average.
- "None" will return an array with the score for each class.

From binary to multiclass and multilabel

1. sklearn.metrics.confusion_matrix

This function evaluates classification accuracy by computing the confusion matrix with each row corresponding to the true class.
By definition, entry $i,j$ in a confusion matrix is the number of observations actually in group $i$ , but predicted to be in group $j$ .

2. Multiclass Classification metrics

2. sklearn.metrics.balanced_accuracy_score

Balanced accuracy in binary and multiclass classification problems is defined as the average of recall obtained on each class.

3. sklearn.metrics.cohen_kappa_score

Cohen’s kappa is a statistic that measures inter-annotator agreement (a score that expresses the level of agreement between two annotators on a classification problem)

4. sklearn.metrics.hinge_loss

Hinge_loss function computes the average distance between the model and the data using hinge loss, a one-sided metric that considers only prediction errors.

5. sklearn.metrics.matthews_corrcoef

It is a correlation coefficient value between -1 and +1.
+1 represents a perfect prediction, 0 an average random prediction and -1 an inverse prediction.

2. Multiclass Classification metrics

6. sklearn.metrics.roc_auc_score

The toc_auc_score function used in multi-class classification supports two strategies:
1. one-vs-one algorithm computes the average of the pairwise ROC AUC scores
2. one-vs-rest algorithm computes the average of the ROC AUC scores for each class against all other classes.

7. sklearn.metrics.top_k_accuracy_score

Computes the number of times where the correct label is among the top k labels predicted (ranked by predicted scores).

2. Multiclass Classification metrics

1. sklearn.metrics.accuracy_score

In multilabel classification, this function computes subset accuracy: the set of labels predicted for a sample must exactly match the corresponding set of labels in y_true.

2. sklearn.metrics.multilabel_confusion_matrix

Calculates class-wise or sample-wise multilabel confusion matrices.
When calculating class-wise multilabel confusion matrix $C$ , the count of true negatives for class $i$ is $C_{i,0,0}$ , false negatives is $C_{i,1,0}$ , true positives is $C_{i,1,1}$ and false positives is $C_{i,0,1}$ .
Example:

3. Multilabel Classification metrics

from sklearn.metrics import multilabel_confusion_matrix
y_true = ["cat", "ant", "cat", "cat", "ant", "bird"]
y_pred = ["ant", "ant", "cat", "cat", "ant", "cat"]
multilabel_confusion_matrix(y_true, y_pred,labels=["ant", "bird", "cat"])

3. sklearn.metrics.classification_report

This function builds a text report showing the main classification metrics.
Example:

4. sklearn.metrics.zero_one_loss

If normalize parameter is True, this function returns the fraction of misclassifications (float), else it returns the number of misclassifications (int).

3. Multilabel Classification metrics

from sklearn.metrics import classification_report
y_true = [0, 1, 2, 2, 0]
y_pred = [0, 0, 2, 1, 0]
target_names = ['class 0', 'class 1', 'class 2']
print(classification_report(y_true, y_pred, target_names=target_names))

4. sklearn.metrics.hamming_loss

Hamming loss is the fraction of labels that are incorrectly predicted.
This function computes the average Hamming loss or Hamming distance between two sets of samples.

5. sklearn.metrics.log_loss

Log loss, also called logistic regression loss or cross-entropy loss, is defined on probability estimates.
This function computes log loss given a list of ground-truth labels and a probability matrix, as returned by an estimator’s predict_proba method.

6. sklearn.metrics.jaccard_score

The Jaccard index or Jaccard similarity coefficient, defined as the size of the intersection divided by the size of the union of two label sets, is used to compare set of predicted labels for a sample to the corresponding set of labels in y_true .

3. Multilabel Classification metrics

7. Precision, recall and F-measures

Note: Best value is 1 and the worst value is 0 for these scores.

3. Multilabel Classification metrics

`sklearn.metrics.precision_score`	computes precision which is intuitively the ability of the classifier not to label as positive a sample that is negative.
`sklearn.metrics.recall_score`	computes recall which is intuitively the ability of the classifier to find all the positive samples.
`sklearn.metrics.f1_score`	computes harmonic mean of precision and recall
`sklearn.metrics.fbeta_score`	computes weighted harmonic mean of precision and recall
`sklearn.metrics.average_precision_score`	computes the average precision from prediction scores (this score does not supports multiclass)

8. sklearn.metrics.precision_recall_fscore_support

Compute precision, recall, F-measure and support for each class.

Copy of Classification functions in sci-kit learn

By Amrutha

Copy of Classification functions in sci-kit learn

3 years ago
260

Amrutha

Course Content Developer for Deep Learning course by Professor Mitesh Khapra. Offered by IIT Madras Online degree - Programming and Data Science.

Classification functions in sci-kit learn

Dr. Ashish Tendulkar

IIT Madras

Contents

Ridge classification

Ridge classification

Logistic regression

Logistic regression

Logistic regression

SGD classifier

SGD classifier

SGD classifier

SGD classifier

SGD classifier

Perceptron

Nearest neighbor classifier

Nearest neighbor classifier

Nearest neighbor classifier

Nearest neighbor classifier

Multiclass and multilabel classification

Multiclass classification - Target format

Examples

Multiclass classification

Multiclass classification

Multilabel classification - Target format

Multilabel classification

Probability calibration

Probability calibration - Example

Probability calibration - Example

Calibrating a classifier

Model selection for classification

1. Stratified k-fold

2. StratifiedShuffleSplit

Comparison

3. Logistic Regression CV

3. Logistic Regression CV

Classification metrics

1. Binary Classification metrics

1. Binary Classification metrics

1. Binary Classification metrics

From binary to multiclass and multilabel

2. Multiclass Classification metrics

2. Multiclass Classification metrics

2. Multiclass Classification metrics

3. Multilabel Classification metrics

3. Multilabel Classification metrics

3. Multilabel Classification metrics

3. Multilabel Classification metrics

Copy of Classification functions in sci-kit learn

More from Amrutha