ITCM Project
Classification of handwritten digits
Team5
Outline
- Problem
- Method
- Results
- Discussion
- Conclusion
- References
Goal
Classify Handwritten Digits
Flowchart

Method
Implementation using SVD
(dataset from mnist)
SVD - Singular Value Decomposition

Training Phase
- Given sets of images of known digits (0~9)
- Consider images as 28 x 28 matrices
- Stack all the columns of each image
- Image matrix -> image vector (28x28 -> 784)
- All image vectors of a digit become a matrix A
- Apply SVD to the matrix of each digit
- is the singular images of the digit for i=1, ..., k, where k<=784 is decided by the user, and is the i-th column of U
A = U\Sigma V^T, A\in R^{784\times n}, U\in R^{784\times784}, \Sigma \in R^{784\times n}, V\in R^{n\times n}
u_i
u_i
Training Phase

(step 3)




(step 5)



singular images of 1
u_0
u_1
u_2
Classifying Phase
- For an image of an unknown digit
- Do step 3. in Training phase
- Get an unknown digit vector z
- Calculate the residual vector of the type
- So the residual vector is
\min_{\alpha}||z-U_k\alpha||, where \quad U_k=(u_1, u_2, .., u_k)
||(i-U_kU_k^T)z||_2
Results
Accuracy 84.77%
Different Results
k=10 | k=15 | k=20 | |
---|---|---|---|
test 1000 | 72.1% | 77.3% | 81.3% |
test 10000 | 76.4% | 82.29% | 84.77% |
Discussion
What's wrong?
Confusion Matrix


Test:10000, k=20
Something interesting
k=10 | k=15 | k=20 | |
---|---|---|---|
training: 5000 | 19.2% | 17.4% | 16.9% |
training: 10000 | 5.7% | 9.2% | 9.9% |
training: 500 | 72.1% | 77.1% | 81.3% |
Something interesting

(training size: 500)
Something interesting
(training size: 500)

Something more interesting

Something more interesting
Accuracy: 99.3%

Conclusion
- k=40 is good
- training size larger, less accurate
- CNN is the best
References
-
L. Elden. Matrix methods in data mining and pattern recognition. SIAM, 2007.
ITCM Final Project
By a136489
ITCM Final Project
- 278