Divide & Conquer
for
Deep Metric Learning

Artsiom Sanakoyeu

Heidelberg University

How to compare images?

How similar/dissimilar are these images?

How to compare images?

Project images in d-dimensional Euclidean space where distances directly correspond to a measure of similarity

End to End Metric Learning

Project Image in d-dimensional space where Euclidean distance would make sense

How to compare images?
Metric Learning

Basic idea: learn a metric that assigns small (resp. large) distance to pairs of examples that are semantically similar (resp. dissimilar).

Metric Learning

d-dimensional Embedding Space

How to compare images?
Metric Learning

Basic idea: learn a metric that assigns small (resp. large) distance to pairs of examples that are semantically similar (resp. dissimilar).

Metric Learning

d-dimensional Embedding Space

Small distance

How to compare images?
Metric Learning

Basic idea: learn a metric that assigns small (resp. large) distance to pairs of examples that are semantically similar (resp. dissimilar).

Metric Learning

d-dimensional Embedding Space

Large distance

Metric Learning Applications

Person identification

Image Search

Few-shot learning

Training: Triplet Loss

\boxed{\ell(A,P,N)=\max\left(d(A,P)-d(A,N)+\alpha,0\right)}

Courtesy: cs-230 course, Stanford University

A = Anchor

P = Positive

N = Negative

Leopard

Lion

Easy!

Sampling matters?

Leopard

Jaguar

Hard!

Sampling matters?

Hard Negative Sampling

Sample "hard" (informative) triplets, where loss > 0

Sampling must select different triplets as training progresses

\boxed{\ell(A,P,N)=\max\left(d(f(A),f(P))-d(f(A),f(N))+\alpha,0\right)}

Hard Negative Sampling

 

  • Offline hard negative mining: Slow/Infeasible to compute across the whole dataset. Also this might lead to poor training (considering that mislabeled images and outliers would dominate the hard positives and negatives).
     
  • Online hard negative mining:
    Select P and N (argmax and argmin) from a mini-batch (not from the entire dataset) for each anchor A;
    • requires very large batch size
    • limited by GPU memory  
  • Selecting the hardest negatives can in practice lead to bad local minima early on in training, specifically it can result in a collapsed model (i.e. f(x) = 0) [1].

[1] FaceNet: A Unified Embedding for Face Recognition and Clustering, Schroff et al., 2015

~

Problem: Data distribution is complex and multimodal

 

~

Naive Approach: Learn a single distance metric for all training data

Overfit and fail to generalize well.

Problem: Data distribution is complex and multimodal

 

~

Naive Approach: Learn a single distance metric for all training data

Overfit and fail to generalize well.

Problem: Data distribution is complex and multimodal

 

Problem: GT labels are too coarse


Naive Approach: Learn a single distance metric for all training data

Fails to capture attributes which are not covered by provided GT labels during train.

Problem: Testing on unseen categories

 

Attributes which are the most discriminative on train are not necessary useful on novel test images

Training classes

Test classes

Problem: Misleading attributes / shortcuts

 

Attributes which are the most discriminative on train are not necessary useful on novel test images and other way around

Attributes which are the most discriminative on train are not necessary useful on novel test images and other way around

Chromatic abberation

Clean image

Problem: Misleading attributes / shortcuts

 

~

To alleviate the aforementioned issues: Learn several different distance metrics on non-overlaping subsets of the data.

Divide & Conquer: Split the task in smaller subproblems

 

Divide & Conquer: Split the task in smaller subproblems

 

Divide & Conquer: Split the task in smaller subproblems

 

Divide & Conquer: Split the task in smaller subproblems

 

Divide & Conquer: Cluster the Data

  • Decrease the variance of the data for sub-problems
    • Harder negative
    • Closer (easier) positives

Divide & Conquer

Jointly Split the Embedding Space & Data 

Lerning Embedding Sub-spaces 

S_i

Learnable mask which induces subspace i

Embedding subspace i

Subspace orthogonality loss

Lerning Embedding Sub-spaces 

Method: Summary

  1. Division into subproblems
    1. Compute embeddings for all train images
    2. Split the data into K disjoint subsets
    3. Split the embedding space in K subspaces

  2. Training

    1. Assign a separate learner (loss) to each subspace.

    2. Train K different distance metrics using K learners.

  3. Increase number of subproblems x 2

  4. ...

  5. Conquer the embedding space by combining subspaces together

Experiments

Experiments

Comparison with SOTA

Comparison with SOTA

Let's Look at Learners

Let's Look at Learners

Qualitative results

Qualitative results

Conclusion

  1. Jointly split the data into K disjoint subsets and the embedding space in K subspaces of reduced dimensionality.
    Train K different distance metrics using K learners.
  2. Increase the task complexity over time
  3. Does not require network architecture change, independent on metric learning loss function. 
  4. Achieves state-of-the art results on 5 benchmark datasets.
     

Copy of Identification of Humpback Whales using Deep Metric Learning

By Artsiom S

Copy of Identification of Humpback Whales using Deep Metric Learning

I will discuss our novel `divide and conquer` approach (CVPR 2019) for deep metric learning, which significantly improves the state-of-the-art performance of metric learning on computer vision tasks

  • 989