Topcoder Konica Minolta
Pathological Image Segmentation Challenge
Vladimir Iglovikov
Sr Data Scientist at TrueAccord
PhD in Physics
Kaggle top 100
Problem description
Train
168
Public test
81
Private test
81
Metric
Score = 100000 \times \frac {F1_{micro} + Dice_{instance-wise}} {2}
Score=100000×2F1micro+Diceinstance−wise
F1 = 2 \frac {P \cdot R} {P + R}
F1=2P+RP⋅R
P = \frac {TP} {TP + FP}
P=TP+FPTP
R = \frac {TP} {TP + FN}
R=TP+FNTP
DICE = \frac {2 TP} {(TP + FP) + (TP + FN)}
DICE=(TP+FP)+(TP+FN)2TP
Platform description
- TopCoder
- The hardest part was to deal with the website.
- Submissions once in two hours.
- Last submission at the Leaderboard.
- The last submission is your final submission.
- Private LB realized week+ after the end.
TXT
Google Drive
public class PathImageSegmentation {
public String getURL() {
return "https://drive.google.com/uc?export=download&id=6T4qSMyyIThaubkNPdFREZThzRXc";
}
}
Submission
Image segmentation benchmark results
https://nizhib.com/posts/image-segmentation
Network: UNet
Pipeline
binarization
- 5 folds
- Threshold based on out of fold predictions
- Train augmentations: D4 + color shift + contrast
- Test augmentations: D4
- Optimizer: Adam.
- Cyclic LR (1e-3: 1e-6)
- Loss: BCE - log(dice)
Does cross validation work?
CV | LB |
---|---|
836533 | 754414 |
868841 | 792269 |
885779 | 784527 |
- CV and LB scores are inconsistent
- Improvements in CV do not map to improvements at LB
Why?
- Small amount of data (Train 168, Public test 81, Private test 81)
- Data Leak (found by Evgeny Nizhibitsky)
Problem 1: No person id
Train set is NOT 168 patients with 500x500, but 42 with 1000x1000 => random split leads to data leak!
Solution:
- Merge 168 small patches => 42 large patches
- KFold by patient Id
- Random 500x500 crops from large patches
I did not do it :(
Problem 2: Lazy Scientists :(
Train 1000x1000
Test 500x500
Test
162 x 500 x 500
=>
30 x 1000 x 1000
+
42 x 500 x 500
patches from train
Results.
Private Test |
---|
smudge |
n01z3 |
vkassym |
dulyanov |
ternaus |
pfr |
ZFTurbo |
nizhib |
EgorLakomkin |
albu |
ywi4ebyrawi |
eagle4 |
Public Test |
---|
smudge |
pfr |
EgorLakomkin |
vkassym |
ualabs |
nizhib |
albu |
zaq1xsw2tktk |
forcesh |
ZFTurbo |
n01z3 |
ternaus |
$10,000
$7,000
$5,000
$3,000
$1,000
When UNet does not perform well?
- A lot of data.
- Many classes.
UNet => 0.26
When UNet performs well?
- Small amount of data.
- Binary mask.
For practice Carvana Image Masking Challenge (ends in 3 weeks)
Summary
- Time invested: couple evenings
- Money earned: $1000
Software
PyTorch + OpenCV
Hardware
- i7-5930K
- 32Gb RAM
- 4 x GTX 1080 Ti
deck
By Vladimir Iglovikov
deck
- 2,173