supervised learning
unsupervised learning
unsupervised learning
... efficient representations (embedding, interpret)
... estimations of your data distribution (generate new samples)
... groups of similar samples (free labels, fill blanks)
... outliers (anomaly detection, de-noising)
with unsupervised learning you can find
Dimension reduction
Clustering
How to ?
Dimension reduction
"the curse of dimensionality"
To avoid redundancy and unnecessary computational load
To visualize the data
To improve data representation
(supervised task pre-processing: semi-supervised learning)
Dimension reduction
Feature selection
Feature extraction
Feature selection
| feat1 | feat2 | feat3 | feat4 | feat5 | |
|---|---|---|---|---|---|
| x1 | 1 | 2 | 2 | 6 | 3 |
| x2 | 2 | 4 | 4 | 12 | 7 |
| x3 | 3 | 6 | 8 | 24 | 9 |
| ... | |||||
| xn | 4 | 8 | 16 | 48 | 11 |
Feature selection
| feat1 | feat2 | feat3 | feat4 | feat5 | |
|---|---|---|---|---|---|
| x1 | 1 | 2 | 2 | 6 | 3 |
| x2 | 2 | 4 | 4 | 12 | 7 |
| x3 | 3 | 6 | 8 | 24 | 9 |
| ... | |||||
| xn | 4 | 8 | 16 | 48 | 11 |
| feat1 | feat2 | feat3 | feat4 | feat5 | |
|---|---|---|---|---|---|
| x1 | 1 | 2 | 2 | 6 | 3 |
| x2 | 2 | 4 | 4 | 12 | 7 |
| x3 | 3 | 6 | 8 | 24 | 9 |
| ... | |||||
| xn | 4 | 8 | 16 | 48 | 11 |
Feature selection
| feat1 | feat3 | feat5 | |
|---|---|---|---|
| x1 | 1 | 2 | 3 |
| x2 | 2 | 4 | 7 |
| x3 | 3 | 8 | 9 |
| ... | |||
| xn | 4 | 16 | 11 |
Feature selection
Feature selection example with the Breast Cancer dataset
malignant breast fine needle aspirates
Feature selection example with the Breast Cancer dataset
Feature selection example with the Breast Cancer dataset
e.g. Principal Component Analysis (PCA)
Feature extraction
e.g. Principal Component Analysis (PCA)
Feature extraction
linearly combine features to find mutually orthogonal components
the (principal) components are ranked from
the most "significant" to least "significant"
projecting the data on the first components maximize its spread (variance)
dimension reduction: select the d first components
demo with PCA
Feature extraction
demo with PCA
Feature extraction
PCA
Feature extraction
PCA
Feature extraction
PCA
Feature extraction
PCA
Feature extraction
PCA
Feature extraction
1st dimension of PCA
selection of the "mean texture" feature (normalized)
Dimension reduction (linear)
ICA
PCA
ICA
ISOMAP
Locally Linear Embedding
Hessian Eigenmapping
Local Tangent Space Alignment
t-distributed Stochastic Neighbor Embedding (t-SNE)
UMAP
(deep) auto-encoders...
non-linear
dimension reduction
Dimension reduction (non-linear)
| 0.9 | |||||||||
similarity matrix in input space
t-distributed Stochastic Neighbor Embedding (t-SNE)
Dimension reduction (non-linear)
| 0.2 |
similarity matrix in input space
t-distributed Stochastic Neighbor Embedding (t-SNE)
Dimension reduction (non-linear)
| 0.9 | |||||||||
| 0.2 |
| 0.8 | |||||||||
| 0.3 |
similarity matrix in input space
similarity matrix in lower space
t-distributed Stochastic Neighbor Embedding (t-SNE)
Dimension reduction (non-linear)
similarity matrix in input space
similarity matrix in lower space
L J Frasinski
t-distributed Stochastic Neighbor Embedding (t-SNE)
(Vindas et al. 2021, IEEE IUS 2021 submitted)
example of t-SNE application
accelerating the annotation of a
Transcranial Doppler ultrasound micro-embolic dataset
Find groups of similar examples (clusters)
clustering
what is a cluster ?
clustering
clustering
d
what is a cluster ?
clustering
what is a cluster ?
clustering
what is a cluster ?
clustering
what is a cluster ?
clustering
K-means (distance-based method)
clustering
K-means (distance-based method)
clustering
K-means (distance-based method)
clustering
K-means (distance-based method)
clustering
K-means (distance-based method)
clustering
K-means (distance-based method)
clustering
K-means (distance-based method)
clustering
K-means (distance-based method)
clustering
K-means (distance-based method)
clustering
K-means (distance-based method)
clustering
+ fast (O(n))
- need to know / find k (number of clusters)
- can detect only circular clusters
alt. k-median (more computation because need to sort...)
K-means (distance-based method)
clustering
hierarchical clustering (distance-based method)
agglomerative (bottom up) or divisive (top-down)
use of an appropriate metric d (between samples a and b)
and
a linkage criterion (dissimilarity between sets)
example: single-linkage clustering
clustering
hierarchical clustering (distance-based method)
clustering
hierarchical clustering (distance-based method)
clustering
hierarchical clustering (distance-based method)
clustering
hierarchical clustering (distance-based method)
clustering
hierarchical clustering (distance-based method)
clustering
hierarchical clustering (distance-based method)
clustering
hierarchical clustering (distance-based method)
clustering
hierarchical clustering (distance-based method)
clustering
hierarchical clustering (distance-based method)
clustering
hierarchical clustering (distance-based method)
clustering
hierarchical clustering (distance-based method)
clustering
hierarchical clustering (distance-based method)
clustering
hierarchical clustering (distance-based method)
clustering
hierarchical clustering (distance-based method)
clustering
hierarchical clustering (distance-based method)
clustering
clustering
clustering
clustering
hierarchical clustering (distance-based method)
+ does not need to know the number of clusters before.
+ does not depend on the chosen distance metric (source?)
+ sub-groups discovery
- lower efficiency, O(n^3)
Gaussian Mixture Model with Expected-Maximization
(distribution-based method)
k-means with probability of assignment (instead of closest point assignment)
Gaussian Mixture Model with Expected-Maximization
(distribution-based method)
initialize the k = 2 distribution (*several strategies)
Gaussian Mixture Model with Expected-Maximization
(distribution-based method)
Expectation (E) step
find the probability for each point to be generated by each mixture
Gaussian Mixture Model with Expected-Maximization
(distribution-based method)
Expectation (E) step
find the probability for each point to be generated by each mixture
Gaussian Mixture Model with Expected-Maximization
(distribution-based method)
maximization (M) step:
fit the mixture to the samples
Gaussian Mixture Model with Expected-Maximization
(distribution-based method)
ready for a new E step ?
check the colors in the squares...
Gaussian Mixture Model with Expected-Maximization
(distribution-based method)
Gaussian Mixture Model with Expected-Maximization
(distribution-based method)
Gaussian Mixture Model with Expected-Maximization
(distribution-based method)
no more move ? assign the labels => clusters
or keep the multiple labels ...
clustering
Gaussian Mixture Model with Expected-Maximization
(distribution-based method)
+ not restricted to circular clusters... possibly ellipses !
+ support mixed membership labeling
+ you can generate new samples (probabilistic model)
- need to fix the number of Gaussians (expected number of clusters) as in k-means
clustering
All points within the cluster are mutually density-connected
If a point is "density-reachable" from some point of the cluster, it is also part of the cluster
: neighborhood radius
minPts: minimum number of neighbors to be a core point
DBSCAN (density-based method)
DBSCAN (density-based method)
minPts = 2
core point
neighborhood radius
theses are not core points
DBSCAN (density-based method)
minPts = 2
not core points but reachable!
DBSCAN (density-based method)
minPts = 2
the rest is "noise"
DBSCAN (density-based method)
minPts = 2
different results with smaller epsilon ...
DBSCAN (density-based method)
minPts = 2
different results with greater epsilon ...
clustering
DBSCAN (density-based method)
+ Does not assume any predefined shape on data clusters
- data defined by set of coordinates (not capable of handling arbitrary feature spaces)
- computationally costly... (...)
- not robust to clusters of varying density
=> OPTICS (density-based method)
clustering
Performance Metrics ?
Calinski-Harabaz index
Davies-Bouldin Index
Rand index
Mutual Information based scores
Homogeneity, completeness and V-measure
Fowlkes-Mallows scores
Contingency Matrix
Pair Confusion Matrix
clustering
with aa the mean distance between a sample and all other points in the same cluster
with ab the mean distance between a sample and all other points in the next nearest cluster
Silhouette coefficient (between -1 and 1)
for each sample
the higher its value, the more similar the sample is within its cluster (and not to neighboring clusters).
If most samples have a low or negative value, then the clustering configuration is not appropriate.
clustering
Performance Metrics ?
Calinski-Harabaz index
The higher the Calinski-Harabaz index s(k)s(k) the more dense and well separated the s(k)k-th cluster is.
with $B_k$ the inter-cluster dispersion matrix
and $B_k$ the intra-cluster dispersion matrix
unsupervised learning
Dimension reduction
Clustering