## Health Data Science Meetup

### November 7, 2016

Support Vector Machines

Implementations in Python

## Support Vector Machines

### Binary Classification

Logistic Regression

Sigmoid Function / Logistic Function

Decision Boundary

Cost Function

### Large Margin Classifiers

If we set C to a very large value, the optimization of cost function will focus on the left part of the cost function.

Choose parameters such that:

### Large Margin Classifier

$x_1$
$x_2$

### Soft Margin Classifier

$x_1$
$x_2$

Very large C

C not too large

### Non-linear Decision Boundary

$x_1$
$x_2$
$\theta_0+\theta_1f_1+\theta_2f_2+...$

We can use different functions

### Kernel

$x_1$
$x_2$
$l^{(1)}$
$l^{(2)}$
$l^{(3)}$
• Given x, compute new feature depending on proximity to landmarks

Gaussian Kernels

$x$
$f_1\approx1, f_2\approx0, f_3\approx0$
$x_1$
$x_2$
$f_1$

### Kernel

Where to get                       ?

$l^{(1)},l^{(2)},l^{(3)}$
$x_1$
$x_2$
$x_1$
$x_2$
$l^{(1)}$
$x^{(1)}$

### SVM Parameters

• Large C: Lower bias, high variance
• Small C: Higher bias, low variance
$C=1/\lambda$
$\sigma^2$

for Gaussian Kernel

• Large      : Higher bias, low variance
Features    vary more smoothly.
• Small      : Lower bias, high variance
Features    vary less smoothly
$\sigma^2$
$\sigma^2$
$f_i$
$f_i$

### Kernel

• Linear Kernel SVM

• Polynomial Kernel SVM

• Gaussian (Radial basis function or rbf) Kernel SVM

