Siamese Neural Networks for One-shot Image Recognition

ICML'15

Gregory Koch - Richard Zemel - Ruslan Salakhutdinov

University of Toronto

Café de la Recherche #6 @Sicara

26 juillet 2018 - Antoine Toubhans

Siamese Neural Networks

  1. One-shot learning
  2. The Omniglot dataset
  3. Other approaches
  4. Siamese neural nets
  5. Benchmark results
  6. When should I use Siamese NN?
  7. Keras Implementation

Siamese Neural Networks

  1. One-shot learning
  2. The Omniglot dataset
  3. Other approaches
  4. Siamese neural nets
  5. Benchmark results
  6. When should I use Siamese NN?
  7. Keras Implementation

One-shot learning

Siamese Neural Networks

  1. One-shot learning
  2. The Omniglot dataset
  3. Other approaches
  4. Siamese neural nets
  5. Benchmark results
  6. When should I use Siamese NN?
  7. Keras Implementation

The Omniglot dataset

  • kind of internationalized MNIST
  • 50 alphabets (40 "background" + 10 "evaluation)
    • Well-established ones, like Latin, Korean, Sanskrit
    • Lesser known local alphabet
    • Fictitious alphabet e.g., Klingon
  • range from 15 up to 40 characters
  • for each alphabet: 20 drawers produced each characters once
  • collected by via Amazon’s Mechanical Turk

The Omniglot dataset - One-Shot learning

Learn from the 40 "background" alphabets

The Omniglot dataset - One-Shot learning

Siamese Neural Networks

  1. One-shot learning
  2. The Omniglot dataset
  3. Other approaches
  4. Siamese neural nets
  5. Benchmark results
  6. When should I use Siamese NN?
  7. Keras Implementation

Naive approach: Nearest Neighbor

\mathbb{p}(x) = \text{argmin}_{c\in C}\vert\vert x - x_c \vert\vert
p(x)=argmincCxxc\mathbb{p}(x) = \text{argmin}_{c\in C}\vert\vert x - x_c \vert\vert

Accuracy of 26.5 % !

(Random ~5%)

Variational Bayesian framework (Li Fei-Fei and al. early's 2000)

  • First work around One-Shot learning

Hierarchical Bayesian Program Learning (Lake and al. 2013)

  • Best achieved accuracy: 95.2 %, close to Humans (95.5%)
  • involve learning a generative model for strokes
    • ​specific to the Omniglot dataset

The deep approach

Boltzmann Machines (Lecun and Al. 2005)

  • Accuracy of 62 %
  • use contrastive energy function
  • completely unsupervised

 

Classical CNN classifier:

  • Strong overfit the "background" alphabet
    • ​only 20 images per class
  • ​Nearest neighbor on the last layer
  • ​Hope the learned features would be usefull to classify an unknown alphabet

Siamese Neural Networks

  1. One-shot learning
  2. The Omniglot dataset
  3. Other approaches
  4. Siamese neural nets
  5. Benchmark results
  6. When should I use Siamese NN?
  7. Keras Implementation

Decompose the problem in two task:

  1. Verification task
    • train a model to discriminate between the class-identity of image pairs
    • "probability" to be in the same class:
  2. One-Shot Classification task
    • a single exemple of each class
    • use the verification model to predict the class of an unknown alphabet

The Siamese Network approach

\mathbb{p}_{\text{classif}}(x) = \text{argmax}_{c\in C} \; \mathbb{p}_{\text{verif}}(x, x_c)
pclassif(x)=argmaxcCpverif(x,xc)\mathbb{p}_{\text{classif}}(x) = \text{argmax}_{c\in C} \; \mathbb{p}_{\text{verif}}(x, x_c)
\{x_c\}_{c\in C}
{xc}cC\{x_c\}_{c\in C}
\mathbb{p}_{\text{verif}}(x_1, x_2)
pverif(x1,x2)\mathbb{p}_{\text{verif}}(x_1, x_2)

Single Layer Siamese Net

Trainning the verifier

  • Split the "background" dataset 
    • train on 30 alphabets and 12 drawers
  • Randomly sample same and different pair of characters
    • dataset #1: 30K pairs
    • dataset #2: 90K pairs
    • dataset #3: 150K pairs
  • data augmentation: linear distortions (x8)
    • augmented dataset #1: 240K pairs
    • augmented dataset #2: 720K pairs
    • augmented dataset #3: 1.2M pairs

Hyperparameters optimization

  •  Bayesian optimization framework
  • The paper's authors used Whetlab
  • Suggested by Quentin: HyperOpt, a Distributed Asynchronous Hyperparameter Optimization in Python

https://www.crunchbase.com/organization/whetlab

https://github.com/hyperopt/hyperopt

Siamese Neural Networks

  1. One-shot learning
  2. The Omniglot dataset
  3. Other approaches
  4. Siamese neural nets
  5. Benchmark results
  6. When should I use Siamese NN?
  7. Keras Implementation

Siamese Neural Networks

  1. One-shot learning
  2. The Omniglot dataset
  3. Other approaches
  4. Siamese neural nets
  5. Benchmark results
  6. When should I use Siamese NN?
  7. Keras Implementation

When should I use a Siamese Net?

  • One-Shot learning
  • Many classes, few element in each class
  • A reasonable dataset for trainning the verification model
    • ~ [15-40 characters] x [40 alphabets] = ~1K classes
    • ~ 20 examples per class
    • Omniglot tasks is pretty simple though, a real task would require more data

Hypotheses ! Should be empirically tested !

Siamese Neural Networks

  1. One-shot learning
  2. The Omniglot dataset
  3. Other approaches
  4. Siamese neural nets
  5. Benchmark results
  6. When should I use Siamese NN?
  7. Keras Implementation

https://github.com/Goldesel23/Siamese-Networks-for-One-Shot-Learning

https://sorenbouma.github.io/blog/oneshot/

https://www.cs.cmu.edu/~rsalakhu/papers/oneshot1.pdf

The paper:

A blog post about Siamese Networks:

A Keras implementation:

Ressources

Siamese Neural Networks for One-shot Image Recognition

By Antoine Toubhans

Siamese Neural Networks for One-shot Image Recognition

  • 1,300