Kaggle: Deep Learning to Create a Model for Binary Segmentation of Car Images

Vladimir Iglovikov

Data Scientist at Lyft

PhD in Physics

Kaggle Master (31st out of 70,000+)

Problem statement

Input

Output

735 teams

Problem statement

  1. Train: 5088
  2. Public Test: 1200
  3. Private Test: 3664
  4. Unlabeled test: 95200
  5. Resolution: 1918x1280 

Problem statement

  1. Train: 5088
  2. Public Test: 1200
  3. Private Test: 3664
  4. Extra in test: 95200 

Each car has 16 unique orientations 

Dice = 2 \times \frac {|Y \cap P|} {|Y| + |P|}
Dice=2×YPY+PDice = 2 \times \frac {|Y \cap P|} {|Y| + |P|}

Metric:

Problems with the data

Mistakes in Masks

Inconsistent Labeling

Tricky cases

 

Team

Alexander Buslaev

Kaggle Master (top 100)

Deep Learning at work

Artem Sanakoev

Kaggle Master (top 100)

Deep Learning in school

Vladimir Iglovikov

Kaggle Master (top 100)

Deep Learning at work

Approach

  • Q: What is the DeepHammer for binary segmentation?
  • A: UNet.

UNet: strong baseline for nearly every binary image segmentation problem

Vladimir's approach

TernausNet (UNet with pre-trained VGG11) encoder. 

arXiv:1801.05746
  1. 5 folds
  2. Input 1920x1280
  3. HSV / grayscale augmentations
  4. Cyclic Learning Rate
  5. Optimizer: Adam
  6. Batch size: 4
  7. Pseudo Labeling

Q: Does pre-trained encoder help UNet?

A: It depends.

  1. Pre-trained on 8Bit RGB will speed up convergence on 8bit RGB
  2. Pre-trained on 8bit RGB will not help on 11bit images. (Satellite)

Vladimir Iglovikov, Alexey Shvets TernausNet: U-Net with VGG11 Encoder Pre-Trained on ImageNet for Image Segmentation

arXiv:1801.05746

Vladimir's approach

TernausNet (UNet with pre-trained VGG11) encoder. 

arXiv:1801.05746

Pseudo Labeling (useful at work)

  1. Train: 5088
  2. Public Test: 1200
  3. Private Test: 3664
  4. Unlabeled in test: 95200

Typically works if:

  1. A lot of unlabeled data.
  2. Prediction accuracy is high (in our case 0.997+)

Idea: pick the most confident predictions and add them to train data

Works well if you add more than the train set.

Alexander's approach

Had a set of GTX 1080 (not 1080 Ti) => No UNet in full HD

UNet => LinkNet (VGG encoder => Resnet encoder)

Alexander's approach

  1. 5 folds, 6 models
  2. LinkNet
  3. Augmentations: Horizontal flips, shifts, scaling, rotations, HSV augmentations
  4. Loss: BCE + 1 - Dice
  5. TTA: horizontal flips
  6. Optimizer: Adam, RMSProp
  7. Cyclic Learning Rate
  8. Hard negative mining
  9. CLAHE for preprocessing

Artsiom's approach

  1. 7 folds, 2 models
  2. Initialization: ImageNet / Random
  3. Loss: Weighted BCE + 1 - Dice
  4. Optimizer: SGD
  5. Cyclic learning rate
  6. Augmentations: translation, scaling, rotations, contrast, saturation, grayscale

Merging and ​post-processing

  1. Simple average
  2. Convex hull for non-confident regions
  3. Thresholding at 0.5

Conclusion

  • 1st out of 735
  • $12,000 prize
  • 10 evenings spent
  • 20 GPUs used
  • Full reproduction of training on one Titan Pascal X will take 90 days.
  • Full reproduction of inference on one Titan Pascal X will take 13 days.
  • We never met each other in person :D

Vladimir's DevBox.

Deep Learning and Crypto mining

Kaggle: Deep Learning to Create a Model for Binary Segmentation of Car Images

By Vladimir Iglovikov

Kaggle: Deep Learning to Create a Model for Binary Segmentation of Car Images

  • 4,092