How to compare images?

Project images in d-dimensional Euclidean space where distances directly correspond to a measure of similarity

End to End Metric Learning

Project Image in d-dimensional space where Euclidean distance would make sense

How to compare images? Metric Learning

Basic idea: learn a metric that assigns small (resp. large) distance to pairs of examples that are semantically similar (resp. dissimilar).

Metric Learning

d-dimensional Embedding Space

How to compare images? Metric Learning

Basic idea: learn a metric that assigns small (resp. large) distance to pairs of examples that are semantically similar (resp. dissimilar).

Metric Learning

d-dimensional Embedding Space

Small distance

How to compare images? Metric Learning

Basic idea: learn a metric that assigns small (resp. large) distance to pairs of examples that are semantically similar (resp. dissimilar).

Metric Learning

d-dimensional Embedding Space

Large distance

Metric Learning Applications

Person identification

Image Search

Few-shot learning

Training: Triplet Loss

\boxed{\ell(A,P,N)=\max\left(d(A,P)-d(A,N)+\alpha,0\right)}

Courtesy: cs-230 course, Stanford University

A = Anchor

P = Positive

N = Negative

Leopard

Lion

Easy!

Leopard

Jaguar

Hard!

Hard Negative Sampling

Sample "hard" (informative) triplets, where loss > 0

Sampling must select different triplets as training progresses

\boxed{\ell(A,P,N)=\max\left(d(f(A),f(P))-d(f(A),f(N))+\alpha,0\right)}

Hard Negative Sampling

• Offline hard negative mining: Slow/Infeasible to compute across the whole dataset. Also this might lead to poor training (considering that mislabeled images and outliers would dominate the hard positives and negatives).

• Online hard negative mining:
Select P and N (argmax and argmin) from a mini-batch (not from the entire dataset) for each anchor A;
• requires very large batch size
• limited by GPU memory
• Selecting the hardest negatives can in practice lead to bad local minima early on in training, specifically it can result in a collapsed model (i.e. f(x) = 0) [1].

[1] FaceNet: A Unified Embedding for Face Recognition and Clustering, Schroff et al., 2015

~

Naive Approach: Learn a single distance metric for all training data

Overfit and fail to generalize well.

~

Naive Approach: Learn a single distance metric for all training data

Overfit and fail to generalize well.

Problem: GT labels are too coarse

Naive Approach: Learn a single distance metric for all training data

Fails to capture attributes which are not covered by provided GT labels during train.

Problem: Testing on unseen categories

Attributes which are the most discriminative on train are not necessary useful on novel test images

Training classes

Test classes

Attributes which are the most discriminative on train are not necessary useful on novel test images and other way around

Attributes which are the most discriminative on train are not necessary useful on novel test images and other way around

Chromatic abberation

Clean image

~

To alleviate the aforementioned issues: Learn several different distance metrics on non-overlaping subsets of the data.

Divide & Conquer: Cluster the Data

• Decrease the variance of the data for sub-problems
• Harder negative
• Closer (easier) positives

Lerning Embedding Sub-spaces

Learnable mask which induces subspace i

Embedding subspace i

Subspace orthogonality loss

Method: Summary

1. Division into subproblems
1. Compute embeddings for all train images
2. Split the data into K disjoint subsets
3. Split the embedding space in K subspaces

2. Training

1. Assign a separate learner (loss) to each subspace.

2. Train K different distance metrics using K learners.

3. Increase number of subproblems x 2

4. ...

5. Conquer the embedding space by combining subspaces together

Conclusion

https://github.com/CompVis/metric-learning-divide-and-conquer

1. Jointly split the data into K disjoint subsets and the embedding space in K subspaces of reduced dimensionality.
Train K different distance metrics using K learners.
2. Increase the task complexity over time
3. Does not require network architecture change, independent on metric learning loss function.
4. Achieves state-of-the art results on 5 benchmark datasets.

By Artsiom S

Copy of Identification of Humpback Whales using Deep Metric Learning

I will discuss our novel divide and conquer approach (CVPR 2019) for deep metric learning, which significantly improves the state-of-the-art performance of metric learning on computer vision tasks

• 248