Automated construction of deep hierarchies using arbitrary clustering algorithms

Jeroen Tempels

30/03/2017

 

Introduction to deep hierarchies

Theoretical implications of deep hierarchies

Deep hierarchies pipeline with k-means

Sampling

K-means

Feature extraction

Input

Output

Layer N + 1

Deep hierarchies pipeline: sampling

Deep hierarchies pipeline: clustering

K-means

Deep hierarchies pipeline: features

Centroid

Goals of thesis

Pipeline

  • amount of layers
  • clusters per layer

K-means

Agglomerative

Cure

...

Replacing k-means

No centroids?

Create centroids!

Straighforward

???

Cluster visualization

Cluster visualization

K-means

Agglomerative

Performance k-means on reduced dataset

(1024, 0.5834)

Clusters per layer estimation

Cluster validity index = cluster quality measure

 

Inter-cluster distance

Intra-cluster distance

Clusters per layer estimation: results

Clusters per layer estimation: results

HDBSCAN:

  • Density based
  • DBSCAN for all alpha
  • Automatically chooses k

Conclusions

  • Comparable results between clustering algorithms
  • Calinski-Harabasz looks promising
  • HDBSCAN looks promising

Future work

  • Alternative euclidean mean
  • CURE
  • Metric amount of layers
  • Full data set analysis
  • Other data set verification

Questions?

thesis: 30 maart

By Jeroen Tempels

thesis: 30 maart

  • 1,328