Author: Jose Miguel Arrieta Ramos
We have 2 colors of balls on the table that we want to separate.
We get a stick and put it on the table, this works pretty well right?
Some villain comes and places more balls on the table, it kind of works but one of the balls is on the wrong side and there is probably a better place to put the stick now.
SVMs try to put the stick in the best possible place by having as big a gap on either side of the stick as possible.
Now when the villain returns the stick is still in a pretty good spot
Say the villain has seen how good you are with a stick so he gives you a new challenge.
There’s no stick in the world that will let you split those balls well, so what do you do? You flip the table of course! Throwing the balls into the air. Then, with your pro ninja skills, you grab a sheet of paper and slip it between the balls.
Now, looking at the balls from where the villain is standing, they balls will look split by some curvy line.
balls = data
stick = classifier
biggest gap trick = optimization
flipping the table = kernelling
piece of paper = hyperplane.
In geometry a hyperplane is a subspace of one dimension less than its ambient space . If a space is 3-dimensional then its hyperplanes are the 2-dimensional planes , while if the space is 2-dimensional, its hyperplanes are the 1-dimensional lines.
" Intuitively, a good separation is achieved by the hyperplane that has the largest distance to the nearest training-data point of any class (so-called functional margin), since in general the larger the margin the lower the generalization error of the classifier."
If the training data are linearly separable, we can select two parallel hyperplanes that separate the two classes of data, so that the distance between them is as large as possible.
These hyperplanes can be described by the equations:
Geometrically, the distance between these two hyperplanes is
so to maximize the distance between the planes we want to minimize
As we also have to prevent data points from falling into the margin, we add the following constraint:
* To sum up:
Norm :
Text
Hinge loss: The cumulated hinge loss is therefore an upper bound of the number of mistakes made by the classifier
To extend SVM to cases in which the data are not linearly separable, we introduce the hinge loss function:
We then wish to minimize:
from sklearn.svm import SVC
import numpy as np
X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
y = np.array([1, 1, 2, 2])
clf = SVC()
clf.fit(X, y)
print(clf.predict([[-0.8, -1]]))
Pipeline:
Young Female-us
MiddleAge Males-us