Divide & Conquer
for
Deep Metric Learning

Artsiom Sanakoyeu

Heidelberg University

How to compare images?

How similar/dissimilar are these images?

How to compare images?

Project images in d-dimensional Euclidean space where distances directly correspond to a measure of similarity

End to End Metric Learning

Project Image in d-dimensional space where Euclidean distance would make sense

How to compare images?
Metric Learning

Basic idea: learn a metric that assigns small (resp. large) distance to pairs of examples that are semantically similar (resp. dissimilar).

Metric Learning

d-dimensional Embedding Space

How to compare images?
Metric Learning

Basic idea: learn a metric that assigns small (resp. large) distance to pairs of examples that are semantically similar (resp. dissimilar).

Metric Learning

d-dimensional Embedding Space

Small distance

How to compare images?
Metric Learning

Basic idea: learn a metric that assigns small (resp. large) distance to pairs of examples that are semantically similar (resp. dissimilar).

Metric Learning

d-dimensional Embedding Space

Large distance

Metric Learning Applications

Person identification

Image Search

Few-shot learning

Training: Triplet Loss

\boxed{\ell(A,P,N)=\max\left(d(A,P)-d(A,N)+\alpha,0\right)}

Courtesy: cs-230 course, Stanford University

A = Anchor

P = Positive

N = Negative

Leopard

Lion

Easy!

Sampling matters?

Leopard

Jaguar

Hard!

Sampling matters?

Hard Negative Sampling

Sample "hard" (informative) triplets, where loss > 0

Sampling must select different triplets as training progresses

\boxed{\ell(A,P,N)=\max\left(d(f(A),f(P))-d(f(A),f(N))+\alpha,0\right)}

Hard Negative Sampling

Offline hard negative mining: Slow/Infeasible to compute across the whole dataset. Also this might lead to poor training (considering that mislabeled images and outliers would dominate the hard positives and negatives).
Online hard negative mining:
Select P and N (argmax and argmin) from a mini-batch (not from the entire dataset) for each anchor A;
- requires very large batch size
- limited by GPU memory
Selecting the hardest negatives can in practice lead to bad local minima early on in training, specifically it can result in a collapsed model (i.e. f(x) = 0) [1].

[1] FaceNet: A Unified Embedding for Face Recognition and Clustering, Schroff et al., 2015

~

Problem: Data distribution is complex and multimodal

~

Naive Approach: Learn a single distance metric for all training data

→ Overfit and fail to generalize well.

Problem: Data distribution is complex and multimodal

~

Naive Approach: Learn a single distance metric for all training data

→ Overfit and fail to generalize well.

Problem: Data distribution is complex and multimodal

Problem: GT labels are too coarse

Naive Approach: Learn a single distance metric for all training data

→ Fails to capture attributes which are not covered by provided GT labels during train.

Problem: Testing on unseen categories

Attributes which are the most discriminative on train are not necessary useful on novel test images

Training classes

Test classes

Problem: Misleading attributes / shortcuts

Attributes which are the most discriminative on train are not necessary useful on novel test images and other way around

Chromatic abberation

Clean image

Problem: Misleading attributes / shortcuts

~

To alleviate the aforementioned issues: Learn several different distance metrics on non-overlaping subsets of the data.

Divide & Conquer: Split the task in smaller subproblems

Divide & Conquer: Cluster the Data

Decrease the variance of the data for sub-problems
- Harder negative
- Closer (easier) positives

Divide & Conquer

Jointly Split the Embedding Space & Data

Lerning Embedding Sub-spaces

Learnable mask which induces subspace i

Embedding subspace i

Subspace orthogonality loss

Lerning Embedding Sub-spaces

Method: Summary

Division into subproblems
1. Compute embeddings for all train images
2. Split the data into K disjoint subsets
3. Split the embedding space in K subspaces
Training
1. Assign a separate learner (loss) to each subspace.
2. Train K different distance metrics using K learners.
Increase number of subproblems x 2
...
Conquer the embedding space by combining subspaces together

Experiments

Comparison with SOTA

Let's Look at Learners

Let's Look at Learners

Qualitative results

Conclusion

https://github.com/CompVis/metric-learning-divide-and-conquer

Jointly split the data into K disjoint subsets and the embedding space in K subspaces of reduced dimensionality.
Train K different distance metrics using K learners.
Increase the task complexity over time
Does not require network architecture change, independent on metric learning loss function.
Achieves state-of-the art results on 5 benchmark datasets.

Copy of Identification of Humpback Whales using Deep Metric Learning

By Artsiom S

Copy of Identification of Humpback Whales using Deep Metric Learning

I will discuss our novel `divide and conquer` approach (CVPR 2019) for deep metric learning, which significantly improves the state-of-the-art performance of metric learning on computer vision tasks

1,298

Artsiom S

PhD student in Computer Vision

Divide & Conquer for Deep Metric Learning

Artsiom Sanakoyeu

Heidelberg University

How to compare images?

How similar/dissimilar are these images?

How to compare images?

End to End Metric Learning

How to compare images? Metric Learning

How to compare images? Metric Learning

How to compare images? Metric Learning

Metric Learning Applications

Training: Triplet Loss

Sampling matters?

Sampling matters?

Hard Negative Sampling

Hard Negative Sampling

~

Problem: Data distribution is complex and multimodal

~

Problem: Data distribution is complex and multimodal

~

Problem: Data distribution is complex and multimodal

Problem: GT labels are too coarse

Problem: Testing on unseen categories

Problem: Misleading attributes / shortcuts

Problem: Misleading attributes / shortcuts

~

Divide & Conquer: Split the task in smaller subproblems

Divide & Conquer: Split the task in smaller subproblems

Divide & Conquer: Split the task in smaller subproblems

Divide & Conquer: Split the task in smaller subproblems

Divide & Conquer: Cluster the Data

Divide & Conquer

Jointly Split the Embedding Space & Data

Lerning Embedding Sub-spaces

Lerning Embedding Sub-spaces

Method: Summary

Experiments

Experiments

Comparison with SOTA

Comparison with SOTA

Let's Look at Learners

Let's Look at Learners

Qualitative results

Qualitative results

Conclusion

https://github.com/CompVis/metric-learning-divide-and-conquer

Copy of Identification of Humpback Whales using Deep Metric Learning

More from Artsiom S

Divide & Conquer
for
Deep Metric Learning

How to compare images?
Metric Learning

How to compare images?
Metric Learning

How to compare images?
Metric Learning