Topcoder Konica Minolta

Pathological Image Segmentation Challenge

Vladimir Iglovikov

Sr Data Scientist at TrueAccord

PhD in Physics

Kaggle top 100

Problem description

Train

168

Public test

Private test

Metric

Score = 100000 \times \frac {F1_{micro} + Dice_{instance-wise}} {2}

Score = 100000 \times \frac {F1_{micro} + Dice_{instance-wise}} {2}

F1 = 2 \frac {P \cdot R} {P + R}

F1 = 2 \frac {P \cdot R} {P + R}

P = \frac {TP} {TP + FP}

P = \frac {TP} {TP + FP}

R = \frac {TP} {TP + FN}

R = \frac {TP} {TP + FN}

DICE = \frac {2 TP} {(TP + FP) + (TP + FN)}

DICE = \frac {2 TP} {(TP + FP) + (TP + FN)}

Platform description

TopCoder
The hardest part was to deal with the website.
Submissions once in two hours.
Last submission at the Leaderboard.
The last submission is your final submission.
Private LB realized week+ after the end.

TXT

Google Drive

public class PathImageSegmentation {
    public String getURL() {
        return "https://drive.google.com/uc?export=download&id=6T4qSMyyIThaubkNPdFREZThzRXc";
    }
}

Submission

Image segmentation benchmark results

https://nizhib.com/posts/image-segmentation

Network: UNet

Pipeline

binarization

5 folds
Threshold based on out of fold predictions
Train augmentations: D4 + color shift + contrast
Test augmentations: D4
Optimizer: Adam.
Cyclic LR (1e-3: 1e-6)
Loss: BCE - log(dice)

Does cross validation work?

CV	LB
836533	754414
868841	792269
885779	784527

CV and LB scores are inconsistent
Improvements in CV do not map to improvements at LB

Why?

Small amount of data (Train 168, Public test 81, Private test 81)
Data Leak (found by Evgeny Nizhibitsky)

Problem 1: No person id

Train set is NOT 168 patients with 500x500, but 42 with 1000x1000 => random split leads to data leak!

Solution:

Merge 168 small patches => 42 large patches
KFold by patient Id
Random 500x500 crops from large patches

I did not do it :(

Problem 2: Lazy Scientists :(

Train 1000x1000

Test 500x500

Test

162 x 500 x 500

30 x 1000 x 1000

42 x 500 x 500

patches from train

Results.

Private Test
smudge
n01z3
vkassym
dulyanov
*ternaus*
pfr
ZFTurbo
nizhib
EgorLakomkin
albu
ywi4ebyrawi
eagle4

Public Test
smudge
pfr
EgorLakomkin
vkassym
ualabs
nizhib
albu
zaq1xsw2tktk
forcesh
ZFTurbo
n01z3
ternaus

$10,000

$7,000

$5,000

$3,000

$1,000

When UNet does not perform well?

A lot of data.
Many classes.

PSPNet => 0.52

UNet => 0.26

When UNet performs well?

Small amount of data.
Binary mask.

For practice Carvana Image Masking Challenge (ends in 3 weeks)

Summary

Time invested: couple evenings
Money earned: $1000

Software

PyTorch + OpenCV

Hardware

i7-5930K
32Gb RAM
4 x GTX 1080 Ti

deck

By Vladimir Iglovikov

deck

2,144

Vladimir Iglovikov

viglovikov

Topcoder Konica Minolta

Pathological Image Segmentation Challenge

Problem description

Metric

Platform description

Submission

Image segmentation benchmark results

Network: UNet

Pipeline

Does cross validation work?

Problem 1: No person id

Problem 2: Lazy Scientists :(

Results.

When UNet does not perform well?

When UNet performs well?

Summary

deck

More from Vladimir Iglovikov