Automated construction of deep hierarchies using arbitrary clustering algorithms
Jeroen Tempels
30/03/2017
Introduction to deep hierarchies

Theoretical implications of deep hierarchies

Deep hierarchies pipeline with k-means
Sampling
K-means
Feature extraction
Input
Output
Layer N + 1
Deep hierarchies pipeline: sampling


Deep hierarchies pipeline: clustering


K-means
Deep hierarchies pipeline: features

Centroid
Goals of thesis
Pipeline
- amount of layers
- clusters per layer
K-means
Agglomerative
Cure
...
Replacing k-means
No centroids?
Create centroids!

Straighforward

???
Cluster visualization

Cluster visualization


K-means
Agglomerative
Performance k-means on reduced dataset

(1024, 0.5834)
Clusters per layer estimation
Cluster validity index = cluster quality measure


Inter-cluster distance
Intra-cluster distance
Clusters per layer estimation: results


Clusters per layer estimation: results

HDBSCAN:
- Density based
- DBSCAN for all alpha
- Automatically chooses k
Conclusions
- Comparable results between clustering algorithms
- Calinski-Harabasz looks promising
- HDBSCAN looks promising
Future work
- Alternative euclidean mean
- CURE
- Metric amount of layers
- Full data set analysis
- Other data set verification
Questions?
Copy of thesis: 30 maart
By krr
Copy of thesis: 30 maart
- 1,076