From RCNN to SPP-NET
Convolutional Neural Network (CNN)
Convolutional Layer
Pooling Layer
RCNN
- extracts around 2000 bottom-up region proposals(selective search)
- wrap all region proposals to same size 227*227(AlexNet Input)
- computes features for each proposal using a large CNN(ImageNet specific)
RCNN
- Fine tuning: Replacing the CNN’s ImageNet specific 1000-way classification layer with a randomly initialized (N + 1)-way classification layer
- IOU>0.5 labeled as target, otherwize labeled as background
- extract a 4096-dimensional feature vector from each region proposal
- classifies each region using class-specific linear SVM
Disadvantage
- CNNs require a fixed input image size
- Feature computation in RCNN is time-consuming and space-consuming (repeatedly applies the deep convolutional networks to the raw pixels of thousands of warped regions per image)
SPP-NET
Spatial pyramid pooling
reference: http://slideplayer.com/slide/5277459/
Spatial pyramid pooling
reference: http://slideplayer.com/slide/5277459/
Multiple-size training
http://slideplayer.com/slide/5277459/
http://sinb.github.io/Whatyouknowabout/spp-net-1/
- Multiple networks sharing all weights.
- Each network for single size
- feature map size: a x a
- pyramid level: n x n
- pooling window size: ceil(a / n)
- stride: floor(a / n)
2000 convolutions vs 1 convolution
The features can be computed from entire image only once
Much more faster!!
rcnn
By yunqy
rcnn
- 1,176