From Clustering to

Deep Clustering

 

 

Ahcène Boubekki

 

 

UCPH, Denmark

     Affinity-based Clustering

We cannot make everyone happy !

Problem:

Group super heroes

Objective:

Everyone is happy

Minimize unhappiness

     Affinity-based Clustering

0.6

0.4

0.3

0.9

0.1

0.6

0.7

0.8

0.9

0.1

0.4

0.2

0.1

0.9

0.1

0.2

0.3

0.7

0.5

0.8

0.9

0.5

0.4

0.3

Memory
expensive

How many
groups?

Where should we cut the graph?

     Affinity-based Clustering

Group together those that are clearly similar

Strategy:


and treat the rest as noise.

DBSCAN

Ester, Martin, et al. "A density-based algorithm for discovering clusters in large spatial databases with noise." kdd. Vol. 96. No. 34. 1996.

     Affinity-based Clustering

Group together those that are clearly similar

Strategy:

 

Merge if one member is similar enough to one other member.

 

 

Until enough is not satisfied anymore.

 

 

Until 3 clusters are formed.

Agglomerative Clustering

 

Single-linkage


Different merging strategy,
different linkage

 

still queries the Affinity matrix
(N×N)

     Affinity-based Clustering

Remarks:

Which similarity measure?

 

Euclidean distance is easy

 

 

Not always cluster vectors

Cost of the
affinity matrix

 

Compute over mini-batches

 

 

Might repeat computations

Objects don't move!

 

The decision borders move

 

 

Let's make the objects move!

     Affinity-based Clustering

Euclidean distance is easy

Compute over mini-batches

Let's make the objects move!

Euclidean distance is easy

Compute over mini-batches

Let's make the objects move!

color

shape

What are we actually doing?

 

We learn a similarity measure

 

We learn a Kernel!

Feature map

Euclidean

Unknown

 

How do we guide the learning?

     Affinity-based Deep Clustering

Let                                         be a dataset that we want to cluster using a feature map                       .

\mathcal{X}\!=\!\{x_1\ldots x_N\}\!\subset\!\mathbb{R}^d

We want that in the embedding space:

    - similar objects are close to each other,

    - dissimilar ones are far from each other.

f\!:\!\mathbb{R}^d \!\longrightarrow \!\mathbb{R}^p

For each datapoint    , we have a set of positive examples    
and of negative ones      .

x_i
x_i^{+}
x_i^-
\mathcal{J} = \sum_{i=1}^N \sum_{j \in x_i^+} ||x_i -x_j||^2 - \sum_{l \in x_i^-} ||x_i -x_l||^2

Triplet Loss

How do we get these sets?

Euclidean norm not good in practice

\mathcal{J} = \sum_{i=1}^N -\log\Big( \dfrac{ \sum_{j \in x_i^+} \exp \cos( x_i, x_j) / \tau }{ \sum_{l \in x_i^-} \exp \cos( x_i, x_l) / \tau } \Big)

InfoNCE

     Contrastive Learning

Augmentations are pulled closer

Other instances are pushed away

     Centroid-based Clustering

Choose three representatives.

Strategy:

 

Group by similarity.

 

 

Update the representatives.

 

 

 

Continue until convergence.

k-Medoids

Can we learn k-means with a neural network?

If the representatives are not necessarily instances

 

k-means

From Clustering to

Deep Clustering

 

 

Ahcène Boubekki

 

 

UCPH, Denmark