Automated construction of deep hierarchies using arbitrary clustering algorithms
Jeroen Tempels
30/03/2017
Introduction to deep hierarchies
![](https://s3.amazonaws.com/media-p.slid.es/uploads/600703/images/3156779/pasted-from-clipboard.png)
Theoretical implications of deep hierarchies
![](https://s3.amazonaws.com/media-p.slid.es/uploads/600703/images/3632705/deep.png)
Deep hierarchies pipeline with k-means
Sampling
K-means
Feature extraction
Input
Output
Layer N + 1
Deep hierarchies pipeline: sampling
![](https://s3.amazonaws.com/media-p.slid.es/uploads/600703/images/3635676/cats_and_dogs.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/600703/images/3635678/patches.png)
Deep hierarchies pipeline: clustering
![](https://s3.amazonaws.com/media-p.slid.es/uploads/600703/images/3635678/patches.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/600703/images/3635681/clusters.png)
K-means
Deep hierarchies pipeline: features
![](https://s3.amazonaws.com/media-p.slid.es/uploads/600703/images/3635700/features.png)
Centroid
Goals of thesis
Pipeline
- amount of layers
- clusters per layer
K-means
Agglomerative
Cure
...
Replacing k-means
No centroids?
Create centroids!
![](https://s3.amazonaws.com/media-p.slid.es/uploads/600703/images/3633005/clustering.png)
Straighforward
![](https://s3.amazonaws.com/media-p.slid.es/uploads/600703/images/3635729/cluster2.png)
???
Cluster visualization
![](https://s3.amazonaws.com/media-p.slid.es/uploads/600703/images/3156779/pasted-from-clipboard.png)
Cluster visualization
![](https://s3.amazonaws.com/media-p.slid.es/uploads/600703/images/3646520/skmeans.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/600703/images/3646521/agg.png)
K-means
Agglomerative
Performance k-means on reduced dataset
![](https://s3.amazonaws.com/media-p.slid.es/uploads/600703/images/3646963/acc_50000.png)
(1024, 0.5834)
Clusters per layer estimation
Cluster validity index = cluster quality measure
![](https://s3.amazonaws.com/media-p.slid.es/uploads/600703/images/3633005/clustering.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/600703/images/3633005/clustering.png)
Inter-cluster distance
Intra-cluster distance
Clusters per layer estimation: results
![](https://s3.amazonaws.com/media-p.slid.es/uploads/600703/images/3650542/acc_50000.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/600703/images/3650544/calinski_50000.png)
Clusters per layer estimation: results
![](https://s3.amazonaws.com/media-p.slid.es/uploads/600703/images/3646982/hdbscan.png)
HDBSCAN:
- Density based
- DBSCAN for all alpha
- Automatically chooses k
Conclusions
- Comparable results between clustering algorithms
- Calinski-Harabasz looks promising
- HDBSCAN looks promising
Future work
- Alternative euclidean mean
- CURE
- Metric amount of layers
- Full data set analysis
- Other data set verification
Questions?
thesis: 30 maart
By Jeroen Tempels
thesis: 30 maart
- 1,390