Automated construction of deep hierarchies using arbitrary clustering algorithms
Jeroen Tempels
30/03/2017
Introduction to deep hierarchies
Theoretical implications of deep hierarchies
Deep hierarchies pipeline with k-means
Sampling
K-means
Feature extraction
Input
Output
Layer N + 1
Deep hierarchies pipeline: sampling
Deep hierarchies pipeline: clustering
K-means
Deep hierarchies pipeline: features
Centroid
Goals of thesis
Pipeline
- amount of layers
- clusters per layer
K-means
Agglomerative
Cure
...
Replacing k-means
No centroids?
Create centroids!
Straighforward
???
Cluster visualization
Cluster visualization
K-means
Agglomerative
Performance k-means on reduced dataset
(1024, 0.5834)
Clusters per layer estimation
Cluster validity index = cluster quality measure
Inter-cluster distance
Intra-cluster distance
Clusters per layer estimation: results
Clusters per layer estimation: results
HDBSCAN:
- Density based
- DBSCAN for all alpha
- Automatically chooses k
Conclusions
- Comparable results between clustering algorithms
- Calinski-Harabasz looks promising
- HDBSCAN looks promising
Future work
- Alternative euclidean mean
- CURE
- Metric amount of layers
- Full data set analysis
- Other data set verification
Questions?
thesis: 30 maart
By Jeroen Tempels
thesis: 30 maart
- 1,437