Yigit DEMIRAG
Reyhan ERGUN
Gokhan GOK
Neural Network Performances on
MNIST Handwritten Digits Database
BILKENT - 2015
Input: D, the set of K-training objects, and test object
Process:
Compute , the distance between z and every object,
Select , the set of k closest training objects to z.
Output:
Accuracy : 96.13%
Pros:
Cons:
i.e. from the result of quadprog().
2. Used as one vs rest classifier to deal with less number of SVMs.
3. 4000 training data with equally likely distribution.
4. Used highest probability result when more than one binary SVM
estimated a label.
5. Used highest probability "non reported" result when none of the
binary SVM estimated a label.
| N | T | SV | L | Acc | |
|---|---|---|---|---|---|
| Label 0 vs Rest | 4000 | 500 | 2760 | Inf | |
| Label 1 vs Rest | 4000 | 500 | 431 | Inf | |
| Label 2 vs Rest | 4000 | 500 | 2996 | Inf | |
| Label 3 vs Rest | 4000 | 500 | 4000 | Inf | |
| Label 4 vs Rest | 4000 | 500 | 3638 | Inf | |
| Label 5 vs Rest | 4000 | 500 | 3998 | Inf | |
| Label 6 vs Rest | 4000 | 500 | 3445 | Inf | |
| Label 7 vs Rest | 4000 | 500 | 3330 | Inf | |
| Label 8 vs Rest | 4000 | 500 | 4000 | Inf | |
| Label 9 vs Rest | 4000 | 500 | 4000 | Inf | |
| Overall | 4000 | 500 | - | Inf | 46.6 |
| N | T | SV | L | Acc | |
|---|---|---|---|---|---|
| Train data from 1st 4000 | 4000 | 500 | - | Inf | 22.28 |
| Train data from 2nd 4000 | 4000 | 500 | - | Inf | 22.24 |
| Train data from 3rd 4000 | 4000 | 500 | - | Inf | 46 |
| Train data from 4th 4000 | 4000 | 500 | - | Inf | 34.2 |
| Train data from 5th 4000 | 4000 | 500 | - | Inf | 41.8 |
| Train data from 6th 4000 | 4000 | 500 | - | Inf | 44.5 |
Pros
1. Achieves a maximum separation of the data, i.e., "optimal separating hyper-plane"
2. It can deal with very high dimensional data.
3. Computational complexity less when compared with kNN*
4. Use of "kernel trick"**
Cons
1. *But as in our case, when number of SVs are high computational complexity increases. (when training data gets larger, number of SVs get larger)
2. **Need to select a good kernel function (should add new features so that we have a linearly separable data set from)
F.c. Layer - F.c. Layer
| Hyper Parameters | Value |
|---|---|
| Regularization | 1 |
| Learning Rate | 1e-5 |
| # of Hidden Units | 300 |
| Momentum | 0.98 |
| Activation Func. | ReLU |
| Cost | Cross-entropy + reg. |
F.c. Layer - F.c. Layer
Result
89.83%
Conv - Pool - F.c. Layer
Conv - Pool - F.c. Layer
| Hyper Parameters | Value |
|---|---|
| Filter Width | 9 |
| Pooling Size | 2x2 |
| Learning Rate | 0.1 |
| # of Features | 20 |
| Momentum | 0.95 |
| Activation Func. | Softmax |
| Cost | Cross ent. + Reg. |
| Kernel Width | 7 | 9 | 11 |
|---|---|---|---|
| Accuracy | 98.2% | 98.08% | 98.2% |
| # of Kernels | 14 | 16 | 18 | 20 | 22 | 24 | 26 | 28 |
|---|---|---|---|---|---|---|---|---|
| Accuracy | 98.4% | 98.2% | 98.4% | 98.1% | 98.3% | 98.3% | 98% | 98.3% |
| Momentum | 0.2 | 0.4 | 0.6 | 0.8 | 0.9 | 0.95 |
|---|---|---|---|---|---|---|
| Accuracy | 96.5% | 96.6% | 97.4% | 97.9% | 98% | 98.5% |
Best Result: 98.08%
Conv. - Pool - Conv. - Pool - F.c. Layer - F.c. Layer
| Activation Func. | Accuracy |
|---|---|
| Tanh | 99.14% |
| ReLU | 99.18% |
| Sigmoid | 98.77% (Takes too long!) |
| Momentum | Accuracy |
|---|---|
| 0.95 | 99.12% |
| 0.90 | 99.08% |
| 0.98 | 98.84% |
| Dropout | Accuracy |
|---|---|
| 0.2 | 98.86% |
| 0.4 | 98.92% |
| 0.6 | 97.74% |
| Activation Func. | Dropout | Momentum | Accuracy |
|---|---|---|---|
| ReLU | - | - | 99.15% |
| Tanh | - | - | 98.97% |
| ReLU | 0.4 | - | 98.84% |
| ReLU | 0.2 | - | 98.86% |
| ReLU | - | .95 | 99.21% |
| ReLU | - | .98 | 99.09% |
Conv. - Pool - Conv. - Pool - F.c. Layer - F.c. Layer - F.c. Layer
| (arXiv:1502.03167 [cs.LG]) |
| Batch Normalization | Accuracy |
|---|---|
| 2 F.c. Layer | 98.86% |
| 3 F.c. Layer | 98.85% |
| 4 F.c. Layer (+.95 momentum) | 98.79% |
| Activation Func. | Dropout | Momentum | Accuracy |
|---|---|---|---|
| ReLU | - | .95 | 99.31% |
Conv. - Pool - Conv. - Pool - F.c. Layer - F.c. Layer - F.c. Layer - F.c. Layer
Conv - Pool - Conv - Pool - F.c Layer - F.c. Layer - F.c. Layer - F.c. Layer
ReLU
.95 Momentum
No Dropout
DeepRetina