Dstl Safe Passage: Detecting and Classifying Vehicles in Aerial Imagery

Vladimir Iglovikov

Physics, PhD

Kaggle Master

Historical overview

December 2016 - March 2017

Kaggle: Dstl Satellite Imagery Feature Detection

 

Roman Solovyov, Artur Kuzin 2nd place ($30,000)

Vladimir Iglovikov, Sergey Mushinskiy 3rd place ($20,000)

  • blog posts (rus, eng)
  • meetup talks (rus, eng)
  • paper (next week)

 

Organizers spent $465,000 and got state of the art solutions that they can not use.

 

Historical overview

March 2017

  • Press release: Dstl’s Kaggle competition has been a great success
  • DSTL pays BAE Systems to create their own Kaggle: https://www.datasciencechallenge.org and start two competitions (Computer Vision and Natural Language Processing)
  • Problems are pretty good, but rules of the competitions are discriminatory (Everyone can participate, but only limited set of people can claim prize money)
  • We got verbal and written promise from organizers that rules will be changed.

 

Problem Statement

  • RGB satelite images
  • 2000x2000
  • 5cm / pixel
  • 600 train
  • 600 test
  • 9 classes

Problem Statement: class distribution

Figure by Vladislav Kassym

Problem Statement

  • train: 600 images
  • test: 600 images
  • 2000x2000
  • 5 cm / pixel

One quarter of one image

Evaluation Metric

Jaccard = \frac {TP} {TP + FN + FP}
Jaccard=TPTP+FN+FPJaccard = \frac {TP} {TP + FN + FP}
Class Radius
motorcycle 12 pixels (60 cm)
cars 30 pixels (150 cm)
van 40 pixels (200 cm)
bus 45 pixels (225 cm)

Motivation

Why participate?

 

  • Very clean balanced dataset.
  • Knowledge in Image Detection.
  • Good amount of data. (Not too much, not too little.)
  • No data leaks.
  • Codebase will be reused in:
    • Kaggle: Cervix
    • Kaggle: Seals
    • ImageNet 2017

Why not participate?

 

  • No way to claim prize money.
  • No community.
  • Unknown platform. (Hard to sell results.)

Step 1: bounding boxes

Before

After

~ 10 hours

What network architecture to use?

Speed/accuracy trade-offs for modern convolutional object detectors 

arXiv:1611.10012

What network architecture to use?

Faster RCNN

  • Slow to train
  • Slow to predict
  • Accurate in general
  • Accurate on small objects

SSD

  • Fast to train
  • Fast to predict
  • Less accurate in general
  • Pretty bad with small objects

=>

For this task winner: Faster RCNN

Faster RCNN

What framework to use?

Keras + TensorFlow

  • Existing Faster RCNN implementation
  • Familiar code base
  • Good documentation
  • Slow
  • Pain to parallelize

MXNet

  • Existing Faster RCNN implementation
  • Unfamiliar code base
  • OK documentation
  • Fast
  • Zero pain with  parallelization

=>
For this task winner: MXNET

Solution

Train

  • Faster RCNN + VGG16 base
  • random crops 1000x1000
  • D4 group augmentation

 

8 samples/sec

Test

  • overlapping tiles
  • D4 group augmentation
  • Non-Maximum Suppression

 

20 samples/sec

Code - example from MXNet repository

Sources of mistakes: close- packed objects

Sources of mistakes: trains like buses

Sources of mistakes: debris as cars

Main source of mistakes: misclassification

gray car in the shade <=> black car

gray car in the sun <=> white car

blue car in the shade <=> black car

white hatchback <=> white van

hatchback <=> sedan

 

=> 

inconsistent labeling

low predictive power

Summary

  • Centers of cars => bounding boxes (manually)
  • Faster RCNN + VGG16, MXnet
  • D4 group train and test time augmentation

Hardware

  • Intel i7
  • 32Gb RAM
  • 2 x Titan X (Pascal)

Many thanks to:

  • Sergey Mushinskiy
  • Vladislav Kassym
  • Sergey Belousov
Made with Slides.com