Supervised learning :

Support Vector Machine (SVM)

Speaker : Joanne Tseng

2014/11/15

Outline

Review
(supervised / semisupervised / unsupervised learning)
The general supervised learning process
Support Vector Machine (Linear separable problem)
- Cost Function
- Decision Boundary
SVM with kernel (Non-linear separable problem)
- Non-linear Separable
- some choice of Kernel

Review

supervised learning : training data with label
semi-supervised learning : training data with both labeled and unlabeled data
unsupervised learning : training data without label

\{(x_{1},y_{1}),(x_{2},y_{2}),...,(x_{m},y_{m})\}

{(x ​ 1 ​ ​, y ​ 1 ​ ​), (x ​ 2 ​ ​, y ​ 2 ​ ​), . . ., (x ​ m ​ ​, y ​ m ​ ​)}

The general Supervised Learning Process

\{(X_{1},y_{1}),(X_{2},y_{2}),...,(X_{m},y_{m})\}

{(X ​ 1 ​ ​, y ​ 1 ​ ​), (X ​ 2 ​ ​, y ​ 2 ​ ​), . . ., (X ​ m ​ ​, y ​ m ​ ​)}

Training Examples:

Learning Algorithm :

Hypothesis Set :

A

Final Hypothesis :

want:

g

Unknown Target Function :

H =\{ h_{1},h_{2},...,h_{M}\}

H = {h ​ 1 ​ ​, h ​ 2 ​ ​, . . ., h ​ M ​ ​}

f:X\rightarrow Y

f : X \to Y

g(x)\approx f(x)

g (x) \approx f (x)

Support Vector Machine

- Cost Function

- Linear Decision Boundary

First, Let's start from logistic regression.
[REVIEW]

Cost Function(I)

idea of Cost Function:

choose so that is close to for our training examples :

\theta_{1},\theta_{2},...

θ ​ 1 ​ ​, θ ​ 2 ​ ​, . . .

h_{\theta}(x)

h ​ θ ​ ​ (x)

y

\{(x_{1},y_{1}),(x_{2},y_{2}),...,(x_{m},y_{m})\}

{(x ​ 1 ​ ​, y ​ 1 ​ ​), (x ​ 2 ​ ​, y ​ 2 ​ ​), . . ., (x ​ m ​ ​, y ​ m ​ ​)}

Cost Function(II)

Logistic Regression:

J(\theta)=-(y\log h_{\theta}+(1-y)\log(h_{\theta}))

J (θ) = - (y lo g h ​ θ ​ ​ + (1 - y) lo g (h ​ θ ​ ​))

h_\theta = \frac1{1+e^{-\theta^{T} X}}

h ​ θ ​ ​ = \frac{​ 1 ​ ​}{​ 1 + e ​ - θ ​ T ​ ​ X ​ ​ ​}

where

Cost Function(III)

Support Vector Machine:

J(\theta)=-(yCost_{1}(\theta^TX)+(1-y)Cost_{0}(\theta^TX))

J (θ) = - (y C o s t ​ 1 ​ ​ (θ ​ T ​ ​ X) + (1 - y) C o s t ​ 0 ​ ​ (θ ​ T ​ ​ X))

Cost_{1}(z)

C o s t ​ 1 ​ ​ (z)

Cost_{0}(z)

C o s t ​ 0 ​ ​ (z)

Optimization Objective(I)

Logistic Regression :

\min_{\theta}\{[\frac1{m} \sum^{m}_{j=1} y^{(i)} (-\log h_{\theta}x^{(i)})+(1-y^{(i)})(1-\log(1-h_\theta x^{(i)}))]+ \frac\lambda{2m} \sum^{n}_{j=1} \theta_{j}^2 \}

min ​ θ ​ ​ {[\frac{​ 1 ​ ​}{​ m ​} \sum ​ j = 1 ​ m ​ ​ y ​ (i) ​ ​ (- lo g h ​ θ ​ ​ x ​ (i) ​ ​) + (1 - y ​ (i) ​ ​) (1 - lo g (1 - h ​ θ ​ ​ x ​ (i) ​ ​))] + \frac{​ λ ​ ​}{​ 2 m ​} \sum ​ j = 1 ​ n ​ ​ θ ​ j ​ 2 ​ ​}

Two parts:

first term : Cost from the training (A)
: parameter
second term : Regularization term (B)

\lambda

λ

Support Vector Machine :

\min_{\theta} \{CA+B\}

min ​ θ ​ ​ {C A + B}

where

C = \frac1{\lambda}

C = \frac{​ 1 ​ ​}{​ λ ​}

Optimization Objective(II)

Support Vector Machine :

\min_{\theta}\{[C \sum^{m}_{j=1} y^{(i)} Cost_{1}(\theta^Tx^{(i)})+(1-y^{(i)})Cost_{0}(\theta^Tx^{(i)})]+ \frac1{2} \sum^{n}_{j=1} \theta_{j}^2 \}

min ​ θ ​ ​ {[C \sum ​ j = 1 ​ m ​ ​ y ​ (i) ​ ​ C o s t ​ 1 ​ ​ (θ ​ T ​ ​ x ​ (i) ​ ​) + (1 - y ​ (i) ​ ​) C o s t ​ 0 ​ ​ (θ ​ T ​ ​ x ​ (i) ​ ​)] + \frac{​ 1 ​ ​}{​ 2 ​} \sum ​ j = 1 ​ n ​ ​ θ ​ j ​ 2 ​ ​}

NOTE : When C be a very large number, by minimizing the optimization function, we would like to choose the first term (A) to be zero.

SVM Decision Boundary:

Linear Separable Case

largest marginal distance

Large Margin Classifier

SVM Decision Boundary:

In presence of outliers

C : not too large ( : large)

\lambda

λ

C : very large ( : small)

\lambda

λ

SVM with kernel

- Non-linear Decision Boundary

- Other choice of kernel

Non-Linear Decision Boundary

Predict y = 1 if
if

\theta_{0}+\theta_{1}x_{1}+\theta_{2}x_{2}+\theta_{3}x_{1}x_{2}+\theta_{4}x_{1}^2 +...\geq 0

θ ​ 0 ​ ​ + θ ​ 1 ​ ​ x ​ 1 ​ ​ + θ ​ 2 ​ ​ x ​ 2 ​ ​ + θ ​ 3 ​ ​ x ​ 1 ​ ​ x ​ 2 ​ ​ + θ ​ 4 ​ ​ x ​ 1 ​ 2 ​ ​ + . . . \geq 0

h_{\theta}(x)=1

h ​ θ ​ ​ (x) = 1

\theta_{0}+\theta_{1}x_{1}+\theta_{2}x_{2}+\theta_{3}x_{1}x_{2}+\theta_{4}x_{1}^2 +...\geq 0

θ ​ 0 ​ ​ + θ ​ 1 ​ ​ x ​ 1 ​ ​ + θ ​ 2 ​ ​ x ​ 2 ​ ​ + θ ​ 3 ​ ​ x ​ 1 ​ ​ x ​ 2 ​ ​ + θ ​ 4 ​ ​ x ​ 1 ​ 2 ​ ​ + . . . \geq 0

Non-Linear Decision Boundary

By the model of the last page:

\theta_{0}+\theta_{1}f_{1}+\theta_{2}f_{2}+\theta_{3}f_{3}+\theta_{4}f_{4} +...

θ ​ 0 ​ ​ + θ ​ 1 ​ ​ f ​ 1 ​ ​ + θ ​ 2 ​ ​ f ​ 2 ​ ​ + θ ​ 3 ​ ​ f ​ 3 ​ ​ + θ ​ 4 ​ ​ f ​ 4 ​ ​ + . . .

where

f_{1}=x_{1}, f_{2} = x_{2},f_{3}=x_{1}x_{2},f_{4} = x_{1}^2

f ​ 1 ​ ​ = x ​ 1 ​ ​, f ​ 2 ​ ​ = x ​ 2 ​ ​, f ​ 3 ​ ​ = x ​ 1 ​ ​ x ​ 2 ​ ​, f ​ 4 ​ ​ = x ​ 1 ​ 2 ​ ​

Non-Linear Decision Boundary

SVM without Kernel :

\min_{\theta}\{[C \sum^{m}_{j=1} y^{(i)} Cost_{1}(\theta^Tx^{(i)})+(1-y^{(i)})Cost_{0}(\theta^Tx^{(i)})]+ \frac1{2} \sum^{n}_{j=1} \theta_{j}^2 \}

min ​ θ ​ ​ {[C \sum ​ j = 1 ​ m ​ ​ y ​ (i) ​ ​ C o s t ​ 1 ​ ​ (θ ​ T ​ ​ x ​ (i) ​ ​) + (1 - y ​ (i) ​ ​) C o s t ​ 0 ​ ​ (θ ​ T ​ ​ x ​ (i) ​ ​)] + \frac{​ 1 ​ ​}{​ 2 ​} \sum ​ j = 1 ​ n ​ ​ θ ​ j ​ 2 ​ ​}

SVM with Kernels :

\min_{\theta}\{[C \sum^{m}_{j=1} y^{(i)} Cost_{1}(\theta^Tf^{(i)})+(1-y^{(i)})Cost_{0}(\theta^Tf^{(i)})]+ \frac1{2} \sum^{m}_{j=0} \theta_{j}^2 \}

min ​ θ ​ ​ {[C \sum ​ j = 1 ​ m ​ ​ y ​ (i) ​ ​ C o s t ​ 1 ​ ​ (θ ​ T ​ ​ f ​ (i) ​ ​) + (1 - y ​ (i) ​ ​) C o s t ​ 0 ​ ​ (θ ​ T ​ ​ f ​ (i) ​ ​)] + \frac{​ 1 ​ ​}{​ 2 ​} \sum ​ j = 0 ​ m ​ ​ θ ​ j ​ 2 ​ ​}

Some choice of Kernels

Gaussian/RBF kernel (Ridial Basis Function)
- most used
Polynomial Kernel
Sigmoid Kernel
chi-squared Kernel
string Kernel
histogram intersection Kernel

Logistic Regression vs. SVM

Let n be the number of features
Let m be the number of training examples

If n is large (>10,000)(relative to m) :
use logistic regression/ SVM without kernel
If n is small(1-1000),m is intermediate :
use SVM with Gaussian Kernel
If n is small, m is large(50000+) :
create/ add more features, then use logistic regression/ SVM without Kernel.

Supervised learning :

Support Vector Machine (SVM)

Review

The general Supervised Learning Process

Support Vector Machine

- Cost Function

- Linear Decision Boundary

First, Let's start from logistic regression.
[REVIEW]

Cost Function(I)

Cost Function(II)

Cost Function(III)

Optimization Objective(I)

Optimization Objective(II)

SVM Decision Boundary:

Linear Separable Case

SVM Decision Boundary:

In presence of outliers

SVM with kernel

- Non-linear Decision Boundary

- Other choice of kernel

Non-Linear Decision Boundary

Non-Linear Decision Boundary

Non-Linear Decision Boundary

Some choice of Kernels

Logistic Regression vs. SVM

use Orange & LIBSVM to practice ^.<

SVM

SVM

tseng0211

Supervised learning :

Support Vector Machine (SVM)

Review

The general Supervised Learning Process

Support Vector Machine

​- Cost Function

- Linear Decision Boundary

First, Let's start from logistic regression. [REVIEW]

Cost Function(I)

Cost Function(II)

Cost Function(III)

Optimization Objective(I)

Optimization Objective(II)

SVM Decision Boundary:

Linear Separable Case

SVM Decision Boundary:

In presence of outliers

SVM with kernel

- Non-linear Decision Boundary

- Other choice of kernel

Non-Linear Decision Boundary

Non-Linear Decision Boundary

Non-Linear Decision Boundary

Some choice of Kernels

Logistic Regression vs. SVM

use Orange & LIBSVM to practice ^.<

SVM

More from tseng0211

- Cost Function

First, Let's start from logistic regression.
[REVIEW]