Multi-Class Classification
- needed when there are more than two output signals
- we can use majority voting, which does this:
If we have NY classes, NY(NY-1) / 2 binary classifiers trained
disadvantage: lots of classifiers needed - we can also use one-versus the rest, which is:
individual classifier separates out the unique one from the rest, highest value wins
More classification techniques
Nearest Neighbor (NN)
- input is assigned to class of nearest neighbor
- distance calculated by Euclidean distance between vector (or Pythagorean theorem), with the equation below:
https://en.wikipedia.org/wiki/Euclidean_distance
- this leads to discontinuity and only piecewise linear boundaries because it draws boundaries based on the perpendicular bisector of two data pts
More classification techniques
k-Nearest Neighbors
- Advantage: less sensitive to outliers
- Disadvantage: doesn't take in account the difference in distance
- input assigned to class most common among its k nearest neighbors, where k is a small integer
- ex: there are white and black points, grey point is closest to one black point, but the next closest points are all white. k-NN will be more accurate
Brain-Computer Interfacing by Rajesh P. N. Rao
Learning Vector Quantization (LVQ)
- small set of labeled vectors (codebook vectors) will act as references
- label pt, and calculate the Euclidean distance between input x and the closest codebook vector m
- codebook vectors are random at first, will adapt to be more similar to sample if correctly classify, opposite occurs as well
- each codebook vector is weighed equally
Distinction Sensitive LVQ (DSLVQ)
- for cases weighing codebook vectors differently due to discriminative ability
- weighed distance function, uses this equation:
Naive Bayes Classifier
- probabilistic classifier based on Bayes' rule with strong independence assumptions ("independent feature model")
- specific input belongs to a certain class based on large number of features, assuming these features are independent of each other
- pick the class with maximum posterior probability, which is computed by:
Evaluation of Classification Performance
"But how good is the method?"
Confusion Matrix
- NY x NY matrix, where NY is the number of classes. Rows are true class labels, columns are outputs
- let's take the case of a binary classification, we will be able to see the four entries that are:
- note that as the threshold changes, the number of TPs and FPs also change
Receiver Operating Characteristic (ROC) Curve
- shows how the variation of parameters affect proportions of TPs and FPs
http://web.cs.ucla.edu/~mtgarip/images/ROC_curve.png
the dotted line represents random chance, the more upper left = better than random chance
Classification Accuracy (ACC)
- ratio of correctly classified samples to the total number of samples
ACC = TP + TN / (TP + TN + FP + FN)
- error rate, err = 1 - ACC
- chance level, ACCo = 1/NY
Kappa Coefficient
K = (ACC - ACCo) / (1 - ACCo)
independent of number of samples per class, and number of classes
K = 0 represents chance level, K = 1 is perfect classification, K < 0 bad
Information Transfer Rate (ITR)
- measures both speed and accuracy of BCI
- the important assumption is that there must be the same possibility to be selected for the sample at different trials, ITR is represented by this equation:
BCI book
Cross Validation
- estimation of error rate err
- classifier is tested on a different set of data (test data)
- one for training and one for testing
-
K-fold cross-validation:
-the data is split into K subsets of equal size, and K-1 is used to train classifier, remaining set for testing
tested K times, therefore resulting in K different error rates
Regression
Linear Regression
- Assume: underlying function generating data is linear
w is a "weight vector" or linear filter
u is the input, a vector with K dimensions
- Linear least squares regression: finds w that minimizes sum of squared error
d is vector of training outputs
U is input matrix with rows u from training set
- Advantage: simple to calculate
- Disadvantage: overly simplistic and doesn't account for most non-invasive BCIs
Neural Networks and Backpropagation
- non-linear function approximation
- perceptron (each neuron utilizes threshold output function on weighted sum of inputs)
- sigmoid (logistic) output function:
this results in a nicely differentiable function!
Multilayer neurons
- output of one layer feeds another layer
- most commonly a three layer network
i. input , ii. hidden, iii. outer - this network can arbitrate some nonlinear function
V is weights from input to hidden
W is the weights from hidden to output
But wait! We only know the error for the output layer, so we need to back propagate
Radial Basis Function (RBF) Networks
- recall linear regression:
if we want to increase the power of the model, we will need to include some non-linear basis functions
Gaussian Processes
- problem: algorithm are more certain in regions with more training examples, but the opposite is true
- Gaussian process regression:
-measure of uncertainty regarding outputs
-nonparametric, can change to accommodate the
complexity of the data - ability to predict the next output given the next input
calculating uncertainty can stop accidents!
deck
By tsunwong625
deck
- 454