Dstl Safe Passage: Detecting and Classifying Vehicles in Aerial Imagery
Vladimir Iglovikov
Physics, PhD
Kaggle Master
Historical overview
December 2016 - March 2017
Kaggle: Dstl Satellite Imagery Feature Detection
Roman Solovyov, Artur Kuzin 2nd place ($30,000)
Vladimir Iglovikov, Sergey Mushinskiy 3rd place ($20,000)
- blog posts (rus, eng)
- meetup talks (rus, eng)
- paper (next week)
Organizers spent $465,000 and got state of the art solutions that they can not use.
Historical overview
March 2017
- Press release: Dstl’s Kaggle competition has been a great success
- DSTL pays BAE Systems to create their own Kaggle: https://www.datasciencechallenge.org and start two competitions (Computer Vision and Natural Language Processing)
- Problems are pretty good, but rules of the competitions are discriminatory (Everyone can participate, but only limited set of people can claim prize money)
- We got verbal and written promise from organizers that rules will be changed.
Problem Statement
- RGB satelite images
- 2000x2000
- 5cm / pixel
- 600 train
- 600 test
- 9 classes
Problem Statement: class distribution
Figure by Vladislav Kassym
Problem Statement
- train: 600 images
- test: 600 images
- 2000x2000
- 5 cm / pixel
One quarter of one image
Evaluation Metric
Jaccard = \frac {TP} {TP + FN + FP}
Jaccard=TP+FN+FPTP
Class | Radius |
---|---|
motorcycle | 12 pixels (60 cm) |
cars | 30 pixels (150 cm) |
van | 40 pixels (200 cm) |
bus | 45 pixels (225 cm) |
Motivation
Why participate?
- Very clean balanced dataset.
- Knowledge in Image Detection.
- Good amount of data. (Not too much, not too little.)
- No data leaks.
- Codebase will be reused in:
- Kaggle: Cervix
- Kaggle: Seals
- ImageNet 2017
Why not participate?
- No way to claim prize money.
- No community.
- Unknown platform. (Hard to sell results.)
Step 1: bounding boxes
Before
After
~ 10 hours
What network architecture to use?
Speed/accuracy trade-offs for modern convolutional object detectors
arXiv:1611.10012 |
What network architecture to use?
Faster RCNN
- Slow to train
- Slow to predict
- Accurate in general
- Accurate on small objects
SSD
- Fast to train
- Fast to predict
- Less accurate in general
- Pretty bad with small objects
=>
For this task winner: Faster RCNN
Faster RCNN
What framework to use?
Keras + TensorFlow
- Existing Faster RCNN implementation
- Familiar code base
- Good documentation
- Slow
- Pain to parallelize
MXNet
- Existing Faster RCNN implementation
- Unfamiliar code base
- OK documentation
- Fast
- Zero pain with parallelization
=>
For this task winner: MXNET
Solution
Train
- Faster RCNN + VGG16 base
- random crops 1000x1000
- D4 group augmentation
8 samples/sec
Test
- overlapping tiles
- D4 group augmentation
- Non-Maximum Suppression
20 samples/sec
Code - example from MXNet repository
Sources of mistakes: close- packed objects
Sources of mistakes: trains like buses
Sources of mistakes: debris as cars
Main source of mistakes: misclassification
gray car in the shade <=> black car
gray car in the sun <=> white car
blue car in the shade <=> black car
white hatchback <=> white van
hatchback <=> sedan
=>
inconsistent labeling
low predictive power
Summary
- Centers of cars => bounding boxes (manually)
- Faster RCNN + VGG16, MXnet
- D4 group train and test time augmentation
Hardware
- Intel i7
- 32Gb RAM
- 2 x Titan X (Pascal)
Many thanks to:
- Sergey Mushinskiy
- Vladislav Kassym
- Sergey Belousov
Dstl Safe Passage: Detecting and Classifying Vehicles in Aerial Imagery
By Vladimir Iglovikov
Dstl Safe Passage: Detecting and Classifying Vehicles in Aerial Imagery
- 1,660