Ahcène Boubekki
Michael Kampffmeyer
Ulf Brefeld
Robert Jenssen
UiT The Arctic
University of Norway
Leuphana University
Step by step: Assumptions
Assumptions of k-means
Hard clustering
Null co-variances
Equally likely clusters
Relaxations
Soft assignments
Isotropic co-variances
Dirichlet prior on
Isotropic GMM
Step by step: Gradient Descent
Isotropic GMM with Dirchlet prior
Gradient Descent fails :(
Algorithmical trick
After an M-step:
Simplification due to the prior
Step by step: Where is the AE?
Current Objective function:
Computational trick
AE ⇒ reconstruction
Clustering Module
Loss Function
Reconstruction
Sparsity + Reg.
Sparsity + Merging
Dir. Prior
Network
Clustering Module: Evaluation
Baselines:
Best run
Average
St.Dev
k-means
GMM
iGMM
Accuracy
Clustering Module: Summary
Limitations
Isotropy assumption
Linear partitions only
Tied co-variances
Spherical co-variances
we can cluster à la k-means using a NN
What is the solution?
Kernels!
Feature maps
AE-CM: Introduction
CM
Invertible feature maps to avoid collapsing
Maps are learned using a neural network
AE-CM
AE-CM
Loss Function
Lagrange
does not
meet expectations
CM
Orthonormal
AE-CM: Baselines
AE+KM:
DCN:
Initialization:
Alternate:
End-to-end autoencoder + k-means
DEC:
Initialization:
Alternate:
IDEC:
Same but keep the decoder
DKM:
centroids in ad-hoc matrix
Loss = reconstruction DAE + c-means-like term
Annealing of Softmax temperature
AE+KM:
(2017) DCN:
(2016) DEC:
(2017) IDEC:
(2020) DKM:
Check paper for GAN and VAE baselines
Fully connected layers
AE-CM: Evaluation with random initialization
AE-CM: Evaluation initalized with AE+KM
AE-CM: Toy example
AE-CM: Generative Model
Conclusion
YES
What is next?
Can we approx. k-means with a NN?
Can we jointly learn an embedding?
YES
Improve stability by acting on assignments.
Softmax annealing, Gumbel-Softmax, VAE
Try more complex architectures.
More applications.
Normalize the loss
Ahcène Boubekki
Michael Kampffmeyer
Ulf Brefeld
Robert Jenssen
UiT The Arctic
University of Norway
Leuphana University
Clustering Module: Implementation
Initialization
Random
k-means++
Finalization
Averaging epoch
Clustering Module: Hyperparameters
CM: Prior
CM: E3
AE-CM: Beta vs Lambda
AE-CM+pre
AE-CM+rand
Linearly separable