Automated construction of deep hierarchies using arbitrary clustering algorithms
Jeroen Tempels
30/03/2017
Introduction to deep hierarchies

Theoretical implications of deep hierarchies

Deep hierarchies pipeline with k-means
Sampling
K-means
Feature extraction
Input
Output
Layer N + 1
Deep hierarchies pipeline: sampling


Deep hierarchies pipeline: clustering


K-means
Deep hierarchies pipeline: features

Centroid
Goals of thesis
Pipeline
- amount of layers
 - clusters per layer
 
K-means
Agglomerative
Cure
...
Replacing k-means
No centroids?
Create centroids!

Straighforward

???
Cluster visualization

Cluster visualization


K-means
Agglomerative
Performance k-means on reduced dataset

(1024, 0.5834)
Clusters per layer estimation
Cluster validity index = cluster quality measure


Inter-cluster distance
Intra-cluster distance
Clusters per layer estimation: results


Clusters per layer estimation: results

HDBSCAN:
- Density based
 - DBSCAN for all alpha
 - Automatically chooses k
 
Conclusions
- Comparable results between clustering algorithms
 - Calinski-Harabasz looks promising
 - HDBSCAN looks promising
 
Future work
- Alternative euclidean mean
 - CURE
 - Metric amount of layers
 - Full data set analysis
 - Other data set verification
 
Questions?
thesis: 30 maart
By Jeroen Tempels
thesis: 30 maart
- 1,600