other cars
how far?
Don't run them over!!
which traffic light?
Model
"car"
Input
ImageNet top-5 error rate (%)
Performance: %mAP (mean average precision)
DPM variants
Girshick et al., Rich feature hierarchies for accurate object detection and semantic segmentation, CVPR 2014
DPM variants
Deep Learning
Digit recognition (CNNs, 1989)
Face detection (Haar features + AdaBoost, 2001)
Object detection
(HOGs + SVMs, 2005-2010)
Image classification (Deep CNNs, 2012)
Image captioning
Semantic segmentation
Edge detection
Object detection
More data
Better hardware
Support from industry
Better tools
3-5 minutes per image annotation + $$$
days/weeks to train
low-powered hardware
not enough data
noise
"gibbon"
99.3% confidence
"panda"
57.7% confidence
Correctly classified
Incorrectly classified
Original images
Negative images
Constellation table by Fulo
Low-level: Edges
High-level: person segmentation
textures
object parts
symmetries
"wheel"
"sand"
"rotational symmetry"
Context prediction
Doersch et al., 2015
Jigsaw puzzles
Moroozi and Favaro, 2016
1
2
3
4
8
5
9
6
7
[ECCV 2012, ICCV 2017]
Learning mid-level representations for shape and texture.
head
torso
arms
legs
hands
[arXiv 2016, ISBI 2016, MICCAI 2016]
A transformation for extracting new descriptors of shape, H. Blum, Models for the perception of speech and visual form, 1967
A transformation for extracting new descriptors of shape, H. Blum, Models for the perception of speech and visual form, 1967
Shape matching and recognition
Shape simplification
Shape deformation with volume preservation
Image from BSDS300
Ground-truth segmentation
Ground-truth skeleton
1
2
3
5
4
6
Orientation
Scale
Symmetry probability
Symmetry "tokens"
Clustering
~0.5 sec per image
(40-60x faster than MIL)
Dense representation
Low reconstruction error
Sparse representation
High reconstruction error
Increasing \( w \)
WGSC is NP-hard!
PTAS exist
Set we want to cover
Covering elements (range)
Set costs
Approximation algorithms, Vijay V. Vazirani
Input
MIL
GT-seg
GT-skel
AMAT
color similarity
Input
AMAT
Groups
(color coded)
Thinning
Segmentation
Input
AMAT
Groups
Reconstruction
P(person)
P(horse)
:
P(dog)
dog
person
head
torso
arms
legs
hands
RGB: 152x152
L1: 142x142
L2: 71x71
L3: 63x63
L4: 55x55
L5 25x25
L6 21x21
Scale 1x
Scale 1.5x
Scale 2x
Towards real-time object detection with region proposal networks, S.Ren et al., NIPS 2015
Alzheimer's:
structure degeneration
Schizophrenia: volume abnormalities
[Shenton M.E. et al., Psychiatry Res. 2002]
Tumors: avoid radiation on sensitive regions
[Hoehn D. et al., Journal of Medical Cases, 2012]
Putamen
Ventricle
Caudate
Amygdala
Hippocampus
Visualization and inspection
No need for manual annotation
(time consuming, need experts,
limited reproducibility)
Non-invasive diagnosis and treatment
P(thalamus)
P(putamen)
:
P(caudate)
:
P(white matter)
2D slice
thalamus
white matter
f(CNN output)
d(intensities)
CNN
CNN+MRF
Our results
Groundtruth
Chair
Monitor
Basket
Sketch-based image retrieval
3D models from sketches
Smart scribbles for sketch segmentation, Norris et al., EU computer graphics forum
Example-based sketch segmentation and labelling using CRFs, Schneider et al., TOG 2016
Proximity
Parallelism
Continuity
Closure
N
N
N
N
N
triangle
square
circle
Edge detector result
edge loss \(l_e\)
skeleton loss \(l_s\)
Edge detection network
Skeleton detection network
edge loss \(l_e\)
skeleton loss \(l_s\)
consistency loss \(l_c\)
Input patch \(P\)
Reconstruction \(\tilde P\)
Reconstruction loss \(L(P, \tilde{P})\)
encoder
decoder
Painterly rendering
Interactive segmentation
Constrained image editing
Mahsa Shakeri
Enzo Ferrante
Siddhartha Chandra
Eduard Trulls
P.A. Savalle
George Papandreou
Sven Dickinson
Nikos Paragios
Iasonas Kokkinos
Andrea Vedaldi
Symmetry
Medical imaging
Segmentation and parts