from supervised learning ...

 to unsupervised learning

supervised learning

 unsupervised learning

\mathcal{D} = \{x_i, y_i\}_{1\leq{i} \leq{N}}

unsupervised learning

 unsupervised learning

\mathcal{D} = \{x_i\}_{1\leq{i} \leq{N}}

unsupervised learning

 unsupervised learning

\mathcal{D} = \{x_i\}_{1\leq{i} \leq{N}}

 unsupervised learning

... efficient representations (embedding, interpret)

... estimations of your data distribution (generate new samples)

 

... groups of similar samples (free labels, fill blanks)

... outliers (anomaly detection, de-noising)

with unsupervised learning you can find

 unsupervised learning

Dimension reduction

Clustering

How to ?

 unsupervised learning

Dimension reduction

"the curse of dimensionality"

To avoid redundancy and unnecessary computational load


To visualize the data


To improve data representation
(supervised task pre-processing: semi-supervised learning)

 unsupervised learning

Dimension reduction

Feature selection

 

Feature extraction

 unsupervised learning

Feature selection

feat1 feat2 feat3 feat4 feat5
x1 1 2 2 6 3
x2 2 4 4 12 7
x3 3 6 8 24 9
...
xn 4 8 16 48 11
\mathcal{D} = \{x_i\}_{1\leq{i} \leq{N}}

 unsupervised learning

Feature selection

feat1 feat2 feat3 feat4 feat5
x1 1 2 2 6 3
x2 2 4 4 12 7
x3 3 6 8 24 9
...
xn 4 8 16 48 11
\mathcal{D} = \{x_i\}_{1\leq{i} \leq{N}}
D
M
N

 unsupervised learning

feat1 feat2 feat3 feat4 feat5
x1 1 2 2 6 3
x2 2 4 4 12 7
x3 3 6 8 24 9
...
xn 4 8 16 48 11
\times 2
\times 3

Feature selection

 unsupervised learning

feat1 feat3 feat5
x1 1 2 3
x2 2 4 7
x3 3 8 9
...
xn 4 16 11

Feature selection

 unsupervised learning

Feature selection example with the Breast Cancer dataset

malignant breast fine needle aspirates

D
M = 10 \mathrm{\ (only \ 3 \ here})
N=569

 unsupervised learning

Feature selection example with the Breast Cancer dataset

 unsupervised learning

Feature selection example with the Breast Cancer dataset

 unsupervised learning

e.g. Principal Component Analysis (PCA)

Feature extraction

 unsupervised learning

e.g. Principal Component Analysis (PCA)

Feature extraction

linearly combine features to find mutually orthogonal components

 

the (principal) components are ranked from
the most "significant" to least "significant"

projecting the data on the first components maximize its spread (variance)

 

dimension reduction: select the d first components

 unsupervised learning

demo with PCA

Feature extraction

 unsupervised learning

demo with PCA

Feature extraction

 unsupervised learning

PCA

M = \mathrm{\ 7 \ here}

Feature extraction

V
\lambda_1
\lambda_2
\lambda_3
\lambda_4
\lambda_5
\lambda_6
\lambda_7
v_1
v_2
v_3
v_4
v_5
v_6
v_7
V^{-1} = V^T
\mathrm{diag}(\lambda)
D = V \mathrm{diag}(\lambda)V^{-1}
D
v_1
v_2
v_3
v_4
v_5
v_6
v_7

 unsupervised learning

PCA

Feature extraction

v_1
v_2
v_3
v_4
v_5
v_6
v_7
V^{-1} = V^T
x_{\mathrm{projected}} = V^{-1} x
x_{\mathrm{projected}}
x
d = 7
M = 7

 unsupervised learning

PCA

Feature extraction

v_1
v_2
v_3
v_4
v_5
v_6
v_7
V^{-1} = V^T
x_{\mathrm{projected}} = V^{-1} x
x_{\mathrm{projected}}
x
d = 3
M = 7
M = 7

 unsupervised learning

PCA

D = V \mathrm{diag}(\lambda)V^{-1}
x_{\mathrm{projected}}
x
\mathrm{first\ } d=4 \mathrm{\ rows\ of\ } V^{-1}
D
M = \mathrm{\ 7 \ here}
d = 4

Feature extraction

 unsupervised learning

D = V \mathrm{diag}(\lambda)V^{-1}
D
M = \mathrm{\ 7 \ here}
d = 3
x_{\mathrm{projected}}
x
\mathrm{first\ } d=3 \mathrm{\ rows\ of\ } V^{-1}

PCA

Feature extraction

 unsupervised learning

1st dimension of PCA

selection of the "mean texture" feature (normalized)

\sim 6
\sim 20

 unsupervised learning

Dimension reduction (linear)

ICA

PCA

ICA

 unsupervised learning

ISOMAP

Locally Linear Embedding

Hessian Eigenmapping

Local Tangent Space Alignment

t-distributed Stochastic Neighbor Embedding (t-SNE)

UMAP

(deep) auto-encoders...

non-linear
dimension reduction

 unsupervised learning

Dimension reduction (non-linear)

0.9
N
i
N
j

similarity matrix in input space

t-distributed Stochastic Neighbor Embedding (t-SNE)

 unsupervised learning

Dimension reduction (non-linear)

0.2
N
i
N
j

similarity matrix in input space

t-distributed Stochastic Neighbor Embedding (t-SNE)

 unsupervised learning

Dimension reduction (non-linear)

0.9
0.2
N
N
0.8
0.3
N
N

similarity matrix in input space

similarity matrix in lower space

t-distributed Stochastic Neighbor Embedding (t-SNE)

 unsupervised learning

Dimension reduction (non-linear)

N
N
N
N

similarity matrix in input space

similarity matrix in lower space

L J Frasinski 

t-distributed Stochastic Neighbor Embedding (t-SNE)

 unsupervised learning

(Vindas et al. 2021, IEEE IUS 2021 submitted)

example of t-SNE application


accelerating the annotation of a
Transcranial Doppler ultrasound micro-embolic dataset

 unsupervised learning

Find groups of similar examples (clusters)

clustering

 unsupervised learning

what is a cluster ?

clustering

 unsupervised learning

clustering

d

what is a cluster ?

  • distance-based definition

 unsupervised learning

clustering

what is a cluster ?

  • distance-based definition
  • density-based definition
\rho_b
\rho_a >

 unsupervised learning

clustering

what is a cluster ?

  • distance-based definition
  • density-based definition
  • distribution-based definition

 unsupervised learning

clustering

what is a cluster ?

  • distance-based definition
  • density-based definition
  • distribution-based definition
  • path-based distribution (graphs)

 unsupervised learning

clustering

K-means (distance-based method)

 unsupervised learning

clustering

  1. initialize k samples as centers *

K-means (distance-based method)

 unsupervised learning

clustering

  1. initialize k samples as centers
  2. for each sample associate the label of its closest center

K-means (distance-based method)

 unsupervised learning

clustering

  1. initialize k samples as centers *
  2. for each sample associate the label of its closest center
  3. update the centers (mean position of its group)

K-means (distance-based method)

 unsupervised learning

clustering

  1. initialize k samples as centers *
  2. for each sample associate the label of its closest center 
  3. update the centers (mean position of its group)
  4. repeat steps 2. and 3. until no update in the clusters

K-means (distance-based method)

 unsupervised learning

clustering

  1. initialize k samples as centers *
  2. for each sample associate the label of its closest center
  3. update the centers (mean position of its group)
  4. repeat steps 2. and 3. until no update in the clusters

K-means (distance-based method)

 unsupervised learning

clustering

  1. initialize k samples as centers *
  2. for each sample associate the label of its closest center 
  3. update the centers (mean position of its group)
  4. repeat steps 2. and 3. until no update in the clusters

K-means (distance-based method)

 unsupervised learning

clustering

  1. initialize k samples as centers *
  2. for each sample associate the label of its closest center 
  3. update the centers (mean position of its group)
  4. repeat steps 2. and 3. until no update in the clusters

K-means (distance-based method)

 unsupervised learning

clustering

  1. initialize k samples as centers *
  2. for each sample associate the label of its closest center 
  3. update the centers (mean position of its group)
  4. repeat steps 2. and 3. until no update in the clusters

K-means (distance-based method)

 unsupervised learning

clustering

  1. initialize k samples as centers *
  2. for each sample associate the label of its closest center
  3. update the centers (mean position of its group)
  4. repeat steps 2. and 3. until no update in the clusters

K-means (distance-based method)

 unsupervised learning

clustering

+ fast (O(n))


- need to know / find k (number of clusters)

- can detect only circular clusters

 

 

alt. k-median (more computation because need to sort...)

K-means (distance-based method)

 unsupervised learning

clustering

hierarchical clustering (distance-based method)

agglomerative (bottom up) or divisive (top-down)

 

use of an appropriate metric d (between samples a and b)

 

and

 

a linkage criterion (dissimilarity between sets)

example: single-linkage clustering

\min \{d(a,b):a\in A,b\in B\}

 unsupervised learning

clustering

hierarchical clustering (distance-based method)

 unsupervised learning

clustering

hierarchical clustering (distance-based method)

 unsupervised learning

clustering

hierarchical clustering (distance-based method)

 unsupervised learning

clustering

hierarchical clustering (distance-based method)

 unsupervised learning

clustering

hierarchical clustering (distance-based method)

 unsupervised learning

clustering

hierarchical clustering (distance-based method)

 unsupervised learning

clustering

hierarchical clustering (distance-based method)

 unsupervised learning

clustering

hierarchical clustering (distance-based method)

 unsupervised learning

clustering

hierarchical clustering (distance-based method)

 unsupervised learning

clustering

hierarchical clustering (distance-based method)

 unsupervised learning

clustering

hierarchical clustering (distance-based method)

 unsupervised learning

clustering

hierarchical clustering (distance-based method)

 unsupervised learning

clustering

hierarchical clustering (distance-based method)

 unsupervised learning

clustering

hierarchical clustering (distance-based method)

 unsupervised learning

clustering

hierarchical clustering (distance-based method)

 unsupervised learning

clustering

 unsupervised learning

clustering

 unsupervised learning

clustering

 unsupervised learning

clustering

hierarchical clustering (distance-based method)

+ does not need to know the number of clusters before.
+ does not depend on the chosen distance metric (source?)

+ sub-groups discovery


- lower efficiency, O(n^3)

 unsupervised learning

Gaussian Mixture Model with Expected-Maximization
(distribution-based method)

k-means with probability of assignment (instead of closest point assignment)

 unsupervised learning

Gaussian Mixture Model with Expected-Maximization
(distribution-based method)

initialize the k = 2 distribution (*several strategies)

 unsupervised learning

Gaussian Mixture Model with Expected-Maximization
(distribution-based method)

Expectation (E) step

find the probability for each point to be generated by each mixture

p(X | \theta ^{(t)})

 unsupervised learning

Gaussian Mixture Model with Expected-Maximization
(distribution-based method)

Expectation (E) step

find the probability for each point to be generated by each mixture

{E}_{Z|X,\theta^{(t)}} [\log p(X | \theta ^{(t)})]
=Q(\theta | \theta ^{(t)})

 unsupervised learning

Gaussian Mixture Model with Expected-Maximization
(distribution-based method)

maximization (M) step:

fit the mixture to the samples

\theta^{(t+1)} = \argmax Q(\theta | \theta ^{(t)})

 unsupervised learning

Gaussian Mixture Model with Expected-Maximization
(distribution-based method)

ready for a new E step ?

check the colors in the squares...

 unsupervised learning

Gaussian Mixture Model with Expected-Maximization
(distribution-based method)

 unsupervised learning

Gaussian Mixture Model with Expected-Maximization
(distribution-based method)

 unsupervised learning

Gaussian Mixture Model with Expected-Maximization
(distribution-based method)

no more move ? assign the labels => clusters
or keep the multiple labels ...

 unsupervised learning

clustering

Gaussian Mixture Model with Expected-Maximization
(distribution-based method)

+ not restricted to circular clusters... possibly ellipses !
+ support mixed membership labeling

+ you can generate new samples (probabilistic model)

 

- need to fix the number of Gaussians (expected number of clusters) as in k-means

 

 unsupervised learning

clustering

All points within the cluster are mutually density-connected

 

If a point is "density-reachable" from some point of the cluster, it is also part of the cluster

 

    : neighborhood radius
minPts: minimum number of neighbors to be a core point

DBSCAN  (density-based method)

\epsilon

 unsupervised learning

DBSCAN  (density-based method)

minPts = 2

core point

neighborhood radius

theses are not core points

\epsilon
\epsilon
\epsilon

 unsupervised learning

DBSCAN  (density-based method)

minPts = 2

not core points but reachable!

 unsupervised learning

DBSCAN  (density-based method)

minPts = 2

the rest is "noise"

 unsupervised learning

DBSCAN  (density-based method)

minPts = 2

different results with smaller epsilon ...  

 unsupervised learning

DBSCAN  (density-based method)

minPts = 2

different results with greater epsilon ...  

 unsupervised learning

clustering

DBSCAN  (density-based method)

+ Does not assume any predefined shape on data clusters

 

  - data defined by set of coordinates  (not capable of handling arbitrary feature spaces)
  - computationally costly... (...)

  - not robust to clusters of varying density

 

=> OPTICS (density-based method)

 unsupervised learning

clustering

Performance Metrics ?

Silhouette coefficient 

Calinski-Harabaz index
Davies-Bouldin Index

Rand index
Mutual Information based scores
Homogeneity, completeness and V-measure
Fowlkes-Mallows scores
Contingency Matrix
Pair Confusion Matrix

 unsupervised learning

clustering

\frac{b-a}{\max(a,b)}

with aa the mean distance between a sample and all other points in the same cluster

with ab the mean distance between a sample and all other points in the next nearest cluster

Silhouette coefficient (between -1 and 1)

for each sample

the higher its value, the more similar the sample is within its cluster (and not to neighboring clusters).

If most samples have a low or negative value, then the clustering configuration is not appropriate.

 unsupervised learning

clustering

Performance Metrics ?

Calinski-Harabaz index

 

The higher the Calinski-Harabaz index s(k)s(k) the more dense and well separated the s(k)k-th cluster is.

W_k

with $B_k$ the inter-cluster dispersion matrix

B_k

and $B_k$ the intra-cluster dispersion matrix

 

\frac{\mathrm{Tr}(B_k)}{\mathrm{Tr}(W_k)} \frac{N- k}{k-1}

 unsupervised learning

unsupervised learning

Dimension reduction

Clustering

Unsupervised Learning (part II of Introduction to machine learning (Deep Learning for Medical Imaging School 2021)

By emmanuelrouxfr

Unsupervised Learning (part II of Introduction to machine learning (Deep Learning for Medical Imaging School 2021)

  • 11