Classification

c17hawke

- Sunny Chandra

In machine learning and statistics, classification is -

the problem of identifying to which of a set of categories (sub-populations) a new observation belongs,

on the basis of a training set of data containing observations (or instances) whose category membership is known.

 

-Wikipedia

Our Story

Confusion Matrix

c17hawke

T | N

T | P

F | N

F | P

0 | 0'

1 | 1'

1 | 0'

0 | 1'

0'

0

1

1'

 

Actual class

Predicted or Guessed

c17hawke

Actual Guessed Interpretation
TN T N Guessed (-ve) and True
FP F P Guessed (+ve) but False
FN F N Guessed (-ve) but False
TP T P Guessed (+ve) and True

c17hawke

Any Questions ??

Understanding Classification Graphically

Toy dataset

x y label
0 0 -
1 0 -
1 1 -
3 3 +
3 2 +
2 3 +

x

y

x y label
0 0 -
1 0 -
1 1 -
3 3 +
3 2 +
2 3 +
- +
- 3 0
+ 0 3

Confusion

Matrix

y

x

PERFECT CLASSIFIER

Graphical analysis

y

x

y = -x + 3\ \Rightarrow f(x,y) = 0
f(x,y) >= 0
f(x,y)<0

Any Questions ??

x y label
0 0 -
1 0 -
1 1 +
3 3 +
3 2 -
2 3 +

Classification Errors

y

x

x y label
0 0 -
1 0 -
1 1 +
3 3 +
3 2 -
2 3 +
- +
- 2 1
+ 1 2

Confusion

Matrix

FN

FP

y

x

Any Questions ??

More Example ->

x y label
0 0 -
1 0 -
1 1 -
3 3 +
-3 -3 +
3 -3 +
- +
- 3 0
+ 0 3

Confusion

Matrix

y

x

y

x

Let's do some calculations

Any Questions ??

More examples -

x y z label
0 0 1 -
1 0 2 -
1 1 1 -
3 3 4 +
-3 -3 3 +
3 -3 2 +

dataset with 3 features

3D-View

So what we have learned till now?

Classification

In classification, models are nothing but graphs ( line, curve, surface, etc) that separates two or more classes.

x

y

Regression

In Regression, models are nothing but graphs ( line, curve, surface, etc) that fits through the datapoints.

x

y

I hope that was fun?:D

Example

MNIST dataset

The MNIST database (Modified National Institute of Standards and Technology database)

is a large database of handwritten digits.

c17hawke

Example 1

MNIST dataset-

let's say we have 10,000 data point (handwritten text) in the Training set

Our task -

detect whether a no. is 5 or not

  1. hence Positive => detected 5
  2. and Negative => not a 5

 

Note:

Total no. of 5s are 1000 => other digits are 9000 in total

c17hawke

Evaluation or Performance Measures

c17hawke

1. Accuracy

accuracy =

TP + TN

TP + TN + FP + FN

c17hawke

2. Precision

Precision  =

How many are actually 5 ?

Out of nos. classified as 5

=

TP

TP + FP

c17hawke

3. Recall

Recall    =

How many are classified as 5 ?

Out of nos. that are actually 5

=

TP

TP + FN

c17hawke

Case 1

when no. of 5s (i.e. P's ) = 1000

no. of other digits are (i.e. N's) = 9000

T | N

T | P

F | N

F | P

0 | 0'

1 | 1'

1 | 0'

0 | 1'

0'

0

1

1'

 

Actual data

Predicted or Guessed

9000

1000

0

0

Precision = ?

Accuracy = ?

Recall = ?

FIND

c17hawke

Precision  =

TP

TP + FP

Recall  =

TP

TP + FN

accuracy =

TP + TN

TP + TN + FP + FN

c17hawke

Case 2

when no. of 5s (i.e. P's ) = 1000

no. of other digits are (i.e. N's) = 9000

T | N

T | P

F | N

F | P

0 | 0'

1 | 1'

1 | 0'

0 | 1'

0'

0

1

1'

 

Actual data

Predicted or Guessed

0

0

9000

1000

Precision = ?

Accuracy = ?

Recall = ?

FIND

c17hawke

Precision  =

TP

TP + FP

Recall  =

TP

TP + FN

accuracy =

TP + TN

TP + TN + FP + FN

c17hawke

Observation -

  1. Accuracy can be misleading in case of classification
  2. It's not always great to calculate Precision and Recall every time. So we can go for -->

 

c17hawke

4. F1 Score

  1. Its harmonic mean of Precision and Recall.
  2. It gives equal importance to both values.
  3. It can only result in high score if both are high.

F1 score  =

2

1

Precision

+

Recall

1

c17hawke

It's not always Ok to use F1 score

We need to choose our Performance measurers as per our Problem statement

Examples ->

c17hawke

Example 1

Classify safe videos for kids =>​

  • P = Animation movie
  • N = Adult movie or Offensive content

AIM -

(as low as possible) FP =>

  • guessed = animated
  • actually= adult
 

(doesn't affect much) FN =>

  • guessed = adult
  • actually= animated

=>

Precision >

Recall

Preference -

c17hawke

Precision=

TP

TP + FP

Recall  =

TP

TP + FN

Example 2

Shoplifting detection using camera feed at a shopping mall =>​

  • P = Thief
  • N = Not Thief

Precision=

TP

TP + FP

Recall  =

TP

TP + FN

AIM -

(doesn't affect much) FP =>

  • guessed = Thief
  • actually= Not thief

(as low as possible) FN =>

  • guessed = Not thief
  • actually= Thief

=>

Precision <

Recall

Preference -

c17hawke

Example 3

COVID-19 testing =>​

  • P = infected
  • N = healthy

Precision=

TP

TP + FP

Recall  =

TP

TP + FN

AIM -

(as low as possible) FP =>

  • guessed = infected
  • actually= healthy

(as low as possible) FN =>

  • guessed =  healthy
  • actually= infected

=>

both should be high

 

Preference -

c17hawke

go for high f1 score

 

USUALLY -

when precision is high => Recall will be low

and vice a versa

This is nothing but Precision-Recall Tradeoff

c17hawke

F scores

F    =

x

β

x

precision

+

recall

α ∈ [0, 1]

F    =

1

1 

x

precision

+

1 

x

recall

(1 - α)

α

α

precision

recall

(1 + β  )

β

2

2

x

β ∈ [0, +]

F    =

2

1 

precision

+

1 

recall

1

More Stuff

Precision | Positive Predictive value(PPV)

Recall | Sensitivity |Hit rate |

True Positive rate (TPR)

specificity|selectivity |

True negative rate (TNR)

False negative rate (FNR)

False positive rate (FPR)

P = TP + FN

N = TN + FP

TN FP
FN TP

P(1)

N(0)

Actual

Predicted

Precision and Recall vs Decision threshold curve

Precision vs Recall curve

ROC (Receiver Operating Characteristics) curve

Comparing ROC curve of two models

Multi-class Classification aka Multinomial Classification

  • Algos that can handle multi-class -
    • SGD, Random Forest, Naive Bias, etc.
  • Algos which are strictly Binary classifiers -
    • Logistic Regression, SVM classifier.

Strategies to make binary classifier work for Multi classification

  • OvR (One-vs-Rest) Strategy aka One-vs-All
    • based on the highest decision score
  • OvO (One-vs-One) Strategy
    • make the pairwise binary classifier
    • for N classes you'll train            = N x (N-1) classifiers
    • It is faster as you need to train only a chunk of data at a time.

References -

Classification v2.0

By Sunny

Classification v2.0

This is my presentation on Classification topic in ML

  • 755