Mattis Paulin · Julien Mairal · Matthijs Douze · Zaid Harchaoui · Florent Perronnin · Cordelia Schmid
By:
Presented by:
Saeid Balaneshinkordan
“Rome-Patches”
3. Unsupervised patch-level descriptors
1. patch descriptor:
2. Generated Dataset:
Contributions:
related work:
shallow patch descriptors
image retrieval base on
deep learning
patch description
based on
deep learning
Instance-level Recognition
ref: https://www.robots.ox.ac.uk/~vgg/practicals/instance-recognition/index.html
Match (recognize) a specific object or scene
Instance-level Recognition
ref: https://www.robots.ox.ac.uk/~vgg/practicals/instance-recognition/index.html
The object is recognized despite changes in:
Three steps in
instance-level retrieval systems:
define a suitable metric between two patch sets
select key points that are reproducible under scale and viewpoint changes
should be robust to viewing conditions.
2) description
1) interest point detection
3) matching
interest point detection:
description:
matching:
Instance-level Recognition
ref: https://www.robots.ox.ac.uk/~vgg/practicals/instance-recognition/index.html
application: image retrieval
Local Image Patches
ref: http://www.cs.toronto.edu/~kyros/courses/320/Lectures.2013s/lecture.2013s.03.pdf
Image structures can be analyzed at:
Local Image Patches
ref: http://www.cs.toronto.edu/~kyros/courses/320/Lectures.2013s/lecture.2013s.03.pdf
Perceptually-significant
corner
single surface
uniform texture
edge
Image Retrieval Pipeline
extracte Interest points
encode in descriptor space
aggregate into a compact representation
Image Retrieval Pipeline
extract Interest points
encode in descriptor space
aggregate into a compact representation
Image Retrieval Pipeline:
Interest Point Detection
Interest Point Description
Patch Matching
Keypoint Detection
Mehtod: Hessian affine region detector
affine-invariant detector
preprocessing step to detect interest point
ref: http://www.mathworks.com/discovery/affine-transformation.html
Matching points in two different images (keypoint detection)
extracting salient Points
rectifying affine
normalizing rotation
Image Retrieval Pipeline:
Interest Point Detection
Interest Point Description
Patch Matching
Image Retrieval Pipeline:
Interest Point Description
robust to the perturbations that are not covered by the detector (lighting changes, small rotations, blur,...)
normalized patch
feature representation φ(M) in a Euclidean space
mapping
the affine region
to a fixed-size square
Convolutional neural networks (CNNs)
image classification
image retrieval
modelling natural images
handles local stationary structures
multi-scale
fashion
Performance:
image-level descriptors:
Features output by a CNN’s intermediate layers
Is it possible to derive patch-level descriptors from architectures designed for image-level descriptors ?
image-level descriptors :
outputs of the penultimate layer
patch-level descriptors :
outputs of previous layers (typically the 4th one)
Convolutional neural networks (CNNs)
tend to encode more task-independent information
tend to be similar regardless of the task, the objective function or the level of supervision
is supervised learning required to make good local convolutional features for patch matching and image retrieval?
earlier layers:
filters learned by the first layer:
is supervised learning required?
Convolutional neural networks (CNNs)
use convolutional features
with class supervision for a classification task.
to extend image retrieval:
encoding local descriptors with a model that has been trained for an unrelated image classification task
devising a surrogate classification problem that is as related as possible to image retrieval
using unsupervised learning, such as a convolutional kernel network
to encode fixed-size image patches (size 51×51 pixels):
CNNs are normally trained:
Convolutional neural networks (CNNs)
two successive layers of a CNN
Convolutional Neural Networks
image
matrices corresponding to linear operations
pointwise non-linear functions
down-sampling operation (feature pooling)
AlexNet
7 layers
the first five are convolutional
the last ones are fully connected
size of images that is processed: 224 × 224,
Convolutional neural networks (CNNs)
description of image patches without supervision
Approximation procedure:
stochastic gradient optimization
Deep kernel-based convolutional approach
Convolutional Kernel Networks (CKNs)
based on a kernel (feature) map.
data-independent.
to yield a CKN that outputs patch descriptors:
using sub-sampling of patches and stochastic gradient optimization
CNN feature representation:
relies on filters that are learned
Kernel embedding approximation
exact computations are overwhelming
an explicit finite-dimensional embedding to approximate them:
Multi-layer CKN kernel
two-layer convolutional kernel architecture
Image Retrieval Pipeline:
Interest Point Detection
Interest Point Description
Patch Matching
Image Retrieval Pipeline:
Patch Matching
matching all possible pairs of patches is too expensive
instead:
aggregating patch descriptors into a fixed-length image descriptor, using the VLAD representation
normalization to the VLAD descriptor:
Patch Matching
patches matched
significant changes in lighting,
smaller changes in rotation and skew.
Datasets
Patch retrieval
Mikolajczyk Dataset
RomePatches
Image retrieval
RomePatches-Image
Oxford
UKbench and Holidays
Datasets
Patch and image retrieval on the Rome dataset.
Top: examples of matching patches.
Bottom: Images of the same bundle, that therefore share the same class for image retrieval.
convolutional architectures for patch retrieval
Thank You!
Convolutional Kernels Networks (CKNs)
M and M' be two patches of size m × m
Ω = {1, . . . , m}2 be the set of pixel locations
pz = sub-patch size from M (fixed)
p'z = sub-patch size from M' (fixed)
centered at location z ∈ Ω
Single-layer kernel:
where
learning without supervision with application to matching and instance-level retrieval
(named Patch-CKN) for patch representation, based on convolutional kernel networks
contribution:
convolutional descriptors
patch descriptors
Comparison with state-of-the-art image retrieval results.
Results with * use a Hessian-Affine detector with gravity assumption
Implementation details
Patch Extraction
CNN Implementation
CKN Learning
augment the dataset with perturbed versions of training patches to learn the filters Wk
use “virtual patches”, obtained as transformations of randomly extracted ones to fall back to a classification problem
For a set of patches P, and a set a transformations T , the dataset consists of all τ (p), (τ, p) ∈ T × P.
PhilippNet:
three convolutional and one fully connected layers, takes as input 64x64 patches, and produces a 512-dimensional output
Transformed versions of the same patch share the same label, thus defining surrogate classes.
Convolutional neural networks (CNNs)
Patch retrieval
Parametric exploration of CKNs
number of filters
sub-patch size
subsampling factor
Influence of dimensionality reduction on patch retrieval performance