Binary Semantic Segmentation

Vladimir Iglovikov

Computer Vision Engineer at Lyft, Level5

 

Approach

  • Q: What is the DeepHammer for binary segmentation?
  • A: UNet.

UNet: strong baseline for nearly every binary image segmentation problem

Vladimir's approach

TernausNet (UNet with pre-trained VGG11) encoder. 

arXiv:1801.05746
  1. 5 folds
  2. Input 1920x1280
  3. HSV / grayscale augmentations
  4. Cyclic Learning Rate
  5. Optimizer: Adam
  6. Batch size: 4
  7. Pseudo Labeling

Q: Does pre-trained encoder help UNet?

A: It depends.

  1. Pre-trained on 8Bit RGB will speed up convergence on 8bit RGB
  2. Pre-trained on 8bit RGB will not help on 11bit images. (Satellite)

Vladimir Iglovikov, Alexey Shvets TernausNet: U-Net with VGG11 Encoder Pre-Trained on ImageNet for Image Segmentation

arXiv:1801.05746

Vladimir's approach

TernausNet (UNet with pre-trained VGG11) encoder. 

arXiv:1801.05746

Pseudo Labeling (useful at work)

  1. Train: 5088
  2. Public Test: 1200
  3. Private Test: 3664
  4. Unlabeled in test: 95200

Typically works if:

  1. A lot of unlabeled data.
  2. Prediction accuracy is high (in our case 0.997+)

Idea: pick the most confident predictions and add them to train data

Works well if you add more than the train set.

Alexander's approach

Had a set of GTX 1080 (not 1080 Ti) => No UNet in full HD

UNet => LinkNet (VGG encoder => Resnet encoder)

Alexander's approach

  1. 5 folds, 6 models
  2. LinkNet
  3. Augmentations: Horizontal flips, shifts, scaling, rotations, HSV augmentations
  4. Loss: BCE + 1 - Dice
  5. TTA: horizontal flips
  6. Optimizer: Adam, RMSProp
  7. Cyclic Learning Rate
  8. Hard negative mining
  9. CLAHE for preprocessing

Artsiom's approach

  1. 7 folds, 2 models
  2. Initialization: ImageNet / Random
  3. Loss: Weighted BCE + 1 - Dice
  4. Optimizer: SGD
  5. Cyclic learning rate
  6. Augmentations: translation, scaling, rotations, contrast, saturation, grayscale

Merging and ​post-processing

  1. Simple average
  2. Convex hull for non-confident regions
  3. Thresholding at 0.5

Conclusion

  • 1st out of 735
  • $12,000 prize
  • 10 evenings spent
  • 20 GPUs used
  • Full reproduction of training on one Titan Pascal X will take 90 days.
  • Full reproduction of inference on one Titan Pascal X will take 13 days.
  • We never met each other in person :D

Vladimir's DevBox.

Deep Learning and Crypto mining

Copy of Kaggle: Deep Learning to Create a Model for Binary Segmentation of Car Images

By Vladimir Iglovikov

Copy of Kaggle: Deep Learning to Create a Model for Binary Segmentation of Car Images

  • 376
Loading comments...

More from Vladimir Iglovikov