Yifan Gu, Erin O'Neil, Kye Shi, & Mia Zender
(Math 285J)
- We adapt the methodology presented in Garin & Tauzin's conference paper [1] that uses TDA to classify handwritten numbers based on the following pipeline:
[1] Adélie Garin, & Guillaume Tauzin. (2019). A Topological "Reading" Lesson: Classification of MNIST using TDA
- We restrict our attention to the "4" and "9" numbers within the MNIST digits dataset due to their high classification error rate
Note that there is one connected component in image 2 (diagonal pixels are considered to be a part of the same connected component).
Each pixel with intensity is represented by a vertex and cubes are created between vertices.
The image above has cubes indexed by their intensity and depicts a filtration of cubical complexes.
- What filtration methods are useful for classifying the 4 vs. 9 digits?
- Radial filtration (with choice of center):
Given center c, assign to pixel p the intensity value
where
- Height filtration (with choice of direction):
Given direction v, assign to pixel p
where
- Erosion
Erosion defines a new grayscale image, , at a vertex p is the distance from p to the closest vertex v that has binary value 0.
Note: If , then
- Dilation
i.e. applying erosion to the inverse image
Dilation defines a new grayscale image, , at a vertex p is the distance from p to the closest vertex v that has binary value 1.
Note: If , then
- Radial filtration from the center of mass
Gif of example filtration! (radial?)
And corresponding persistent diagram?
- This is an example of a binarized image
- Which approaches of generating features lead to the best predictive power?
- How tolerant is our methodology to noise?
- Are there practical uses of this TDA-ML pipeline (i.e. is it faster/better than other algorithms)?
| Betti Curves | Heat Kernel | Persistence Landscapes |
|---|---|---|
|
Describes the number of barcodes an x on the vertical axis is contained in |
Gaussians with certain standard deviations are placed over each birth-death pair in the persistence diagram. A negative Gaussian with the same standard deviation is placed across the diagonal in the persistence diagram as well. The kernel maps to . The paper mentions in their "filtration units" |
More on a later slide. |
| Wasserstein Amplitude | Bottleneck Amplitude |
|---|---|
|
|
Amplitude
Note: This is all for a persistence diagram D with birth death pairs .
J. Reininghaus, S. Huber, U. Bauer, and R. Kwitt, “A stable multi-scale kernel for topological machine learning,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015, pp. 4741– 4748.
Persistent Entropy:
where and
Note: This is all for a persistence diagram D with birth death pairs .
is the input to the random forest
A is one of the amplitudes from the previous slide and PE is the persistent entropy
Given n birth death pairs in a persistence diagram, define:
for each birth death pair,
Then, define the first persistence landscape:
Also define the second persistence landscape:
Garin &Tauzin found and for 100 sample
and calculated amplitudes
where and
| 0.25 | 0.15 | 0.05 | 0 | 0 | 0.15 | 0.05 |
| 0.5 | 0.4 | 0 | 0.1 | 0 | 0.4 | 0.1 |
| 0.75 | 0.15 | 0 | 0 | 0.05 | 0.15 | 0.05 |
We create an ensemble TDA-ML algorithm inspired by Garin & Tauzin that combines the predictions of many filtration methods to yield a single prediction:
We keep the following consistent with Garin & Tauzin: image binarization using a threshold value of 0.4, choice of simplicial complex (cubical), choice of machine learning algorithm (random forest). We also use a subset of their filtration methods. We elected not to scale the images using the process by Garin and Tauzin due to errors in the code.
| Number of features | Prediction accuracy |
|---|---|
| 784 | 0.9903 |
| 576 | 0.9858 |
| 384 | 0.9867 |
| 192 | 0.9836 |
| Number of features | Homology group | Prediction accuracy |
|---|---|---|
| 96 | H0 | 0.8824 |
| 96 | H1 | 0.9819 |
| 192 | H0 | 0.9129 |
| 192 | H1 | 0.9828 |
| 288 | H0 | 0.9133 |
| 288 | H1 | 0.9836 |
H0 features perform significantly worse than H1 features, so we do not lose much without using them
| Filtration | Prediction accuracy |
|---|---|
| Height | 0.9584 |
| Radial | 0.9447 |
| Dilation | 0.8969 |
| Erosion | 0.6329 |
Dilation performs much better than erosion, so we should drop erosion and focus on enhancing height and radial filtrations
| Metric | Prediction accuracy |
|---|---|
| Persistence entropy | 0.9584 |
| Wasserstein distance | 0.9841 |
| Bottleneck distance | 0.9398 |
| Persistence landscape | 0.9788 |
We should try to enhance Wasserstein distance and persistence landscape
Noise was generated by sampling a gaussian distribution with mean 0 and standard deviation equal to noise level. We consider 3 levels of noise.
| Noise level | Random Forest | TDA approach |
|---|---|---|
| 30 | 0.989 | 0.975 |
| 50 | 0.983 | 0.879 |
| 100 | 0.966 | 0.838 |
The TDA approach performed poorer than the Random Forest classifier. In addition, the TDA approach was more computationally expensive. Therefore, there does not seem to be any benefits to the TDA approach when handling noisy data.
Hypothesis: TDA approach has a convoluted persistence diagram with noise that becomes hard to extract features from/learn from
Given more time, we would further explore...
- Our TDA approach was not robust to noise
- The traditional random forest algorithm seems to be a more optimal choice for classifying the 4 versus 9 dataset based on our observations so far
-Can features be extracted by sampling persistence landscapes themselves (as opposed to taking norms)?
- Are there any applications where the TDA approach outperforms random forest (e.g. under rotations?)