From RCNN to SPP-NET

Convolutional Neural Network (CNN)

Convolutional Layer

Pooling Layer

RCNN

extracts around 2000 bottom-up region proposals(selective search)
wrap all region proposals to same size 227*227(AlexNet Input)
computes features for each proposal using a large CNN(ImageNet specific)

RCNN

Fine tuning: Replacing the CNN’s ImageNet specific 1000-way classification layer with a randomly initialized (N + 1)-way classification layer
IOU>0.5 labeled as target, otherwize labeled as background
extract a 4096-dimensional feature vector from each region proposal
classifies each region using class-specific linear SVM

Disadvantage

CNNs require a fixed input image size
Feature computation in RCNN is time-consuming and space-consuming (repeatedly applies the deep convolutional networks to the raw pixels of thousands of warped regions per image)

SPP-NET

Spatial pyramid pooling

reference: http://slideplayer.com/slide/5277459/

Spatial pyramid pooling

reference: http://slideplayer.com/slide/5277459/

Multiple-size training

http://slideplayer.com/slide/5277459/

http://sinb.github.io/Whatyouknowabout/spp-net-1/

Multiple networks sharing all weights.
Each network for single size
feature map size: a x a
pyramid level: n x n
pooling window size: ceil(a / n)
stride: floor(a / n)

2000 convolutions vs 1 convolution

The features can be computed from entire image only once

Much more faster!!

rcnn

By yunqy

rcnn

1,288

yunqy

utcs.csie.org/~u10016045/myWebPage/index.html#