Planet: Understanding­ the Amazon from Space

Vladimir Iglovikov

Sr. Data Scientist at TrueAccord

PhD in Physics

Kaggle top 100

  • Industry: interpretability, scalability, size, throughput
  • Academia: novelty
  • Competitions: accuracy

Team ods.ai

Worked hard

Contributed

Problem description

  1. Train: 40k images
  2. Test: 60k images (Public 40k, Private 20k) 

Problem description

  1. Train: 40k images
  2. Test: 60k images (Public 40k, Private 20k) 
  3. JPG: 3 bands R, G, B 8 bit
  4. TIF: 4 bands R, G, B, NIR 16 bit
  5. Resolution (256, 256)

Problem description

  1. Train: 40k images
  2. Test: 60k images (Public 40k, Private 20k) 
  3. JPG: 3 bands R, G, B 8 bit
  4. TIF: 4 bands R, G, B, NIR 16 bit
  5. Resolution (256, 256)
  6. Multilabel classification (17 classes)
  7. Some labels are mutually exclusive.
  8. Labels based on jpg

Problem description

  1. Train: 40k images
  2. Test: 60k images (Public 40k, Private 20k) 
  3. JPG: 3 bands R, G, B 8 bit
  4. TIF: 4 bands R, G, B, NIR 16 bit
  5. Resolution (256, 256)
  6. Multilabel classification (17 classes)
  7. Some labels are mutually exclusive.
  8. Labels based on jpg
F_{\beta} = (1 + \beta^2) \frac {pr} {\beta^2 p + r}
Fβ=(1+β2)prβ2p+rF_{\beta} = (1 + \beta^2) \frac {pr} {\beta^2 p + r}
p = \frac {tp} {tp + fp}
p=tptp+fpp = \frac {tp} {tp + fp}
r = \frac {tp} {tp + fn}
r=tptp+fnr = \frac {tp} {tp + fn}
\beta = 2
β=2\beta = 2

Metric

Classes

https://www.kaggle.com/anokas/data-exploration-analysis/notebook

Specifics of the data / data leak

  • Red: train
  • Green: Public test
  • Blue: Private test 

Way to get => brute force boundary match using L2 distance

  1. Train 40k
  2. Stable validation

=>

fight for 0.0001

=>

stacking

Main idea: building ensemble

Team => set up notation

  1. Code: Private Repository at GitLab. Folder per person.
  2. Data: Google drive
  3. Docs: Google docs / Google sheets

Google Drive => Predictions on train / test per fold in hdf5.

10 Folds

Stratified in a loop starting from the rarest labels

Ways to split into folds:

  1. KFold
  2. Stratified KFold
  3. GridSearch to find good random seed
  4. More advanced techniques (recall Mercedes problem)
     

Let's throw models into stacker...

For each model, for each fold we generate prediction on val and test

Architectures

  • Densenet 121, 169, 201
  • Resnet 34, 50, 101, 152
  • ResNext 50, 101
  • VGG 11, 13, 16, 19
  • DPN 92, 96

Let's throw models into stacker...

For each model, for each fold we generate prediction on val and test

Architectures

  • Densenet 121, 169, 201
  • Resnet 34, 50, 101, 152
  • ResNext 50, 101
  • VGG 11, 13, 16, 19
  • DPN 92, 96

Initialization

  • From scratch
  • ImageNet
  • ImageNet 11k + Places 365

Let's throw models into stacker...

For each model, for each fold we generate prediction on val and test

Architectures

  • Densenet 121, 169, 201
  • Resnet 34, 50, 101, 152
  • ResNext 50, 101
  • VGG 11, 13, 16, 19
  • DPN 92, 96

Initialization

  • From scratch
  • ImageNet
  • ImageNet 11k + Places 365

Loss

  • binary_crossentropy
  • bce  - log(F2_approximation)
  • softmax(weather) + bce(other)

Let's throw models into stacker...

For each model, for each fold we generate prediction on val and test

Architectures

  • Densenet 121, 169, 201
  • Resnet 34, 50, 101, 152
  • ResNext 50, 101
  • VGG 11, 13, 16, 19
  • DPN 92, 96

Initialization

  • From scratch
  • ImageNet
  • ImageNet 11k + Places 365

Loss

  • binary_crossentropy
  • bce  - log(F2_approximation)
  • softmax(weather) + bce(other)

Training

  • Freezing / non-freezing weights
  • Different lr schedule
  • Optimizers Adam, SGD
  • Keras, PyTorch, MXNet

Augmentations

  • Flips
  • Rotations + Reflect
  • Shear
  • Scale
  • Contrast
  • Blur
  • Channel multiplier
  • Channel add

numpy + ImgAug + OpenCV

https://github.com/aleju/imgaug

What about Tiff?

  1. Labels based on JPG
  2. JPG carry enough information
  3. Shifts between JPG and TIFF
  4. All networks pre-trained on 8 bit

It is still possible to get 0.93+ on Tiff.

https://www.kaggle.com/bguberfain/tif-to-jpg-by-matching-percentiles

  • TIFF (RGB + N) => NGB
  • Percentile matching

General pipeline

48 networks

*

10 folds

=

480 networks

ExtraTrees

NN

Weighted

average

Threasholding

LR

Mean

Model importance (xgb)

Manual Review

Thresholding

  1. Gives a lot.
  2. Different class thresholds depend on each other
  3. Weather hack (if cloudy => lower other)
F_2 = (1 + 2^2) \frac {pr} {2^2 p + r}
F2=(1+22)pr22p+rF_2 = (1 + 2^2) \frac {pr} {2^2 p + r}
T = \frac {1} { 1 + 2^2} = 0.2
T=11+22=0.2T = \frac {1} { 1 + 2^2} = 0.2

Worked the best:

On the Bayes-optimality of F-measure maximizers

https://arxiv.org/abs/1310.4849

Did not work

  1. Tiff
  2. indices (NDWI, EVI, SAVI, etc)
  3. Fbeta loss
  4. Two headed networks (weather + softmax, bce for the rest)
  5. Dehazing
  6. Mosaic features

Summary

  • Three weeks
  • ~20 GPUs
  • 480 Networks
  • 7th Place

Q: How many networks do we need to make it a product?

A: One.

Thank you. Let's stay in touch!

deck

By Vladimir Iglovikov