From RCNN to SPP-NET

Convolutional Neural Network (CNN)

Convolutional Layer

Pooling Layer

RCNN

  • extracts around 2000 bottom-up region proposals(selective search)
  • wrap all region proposals to same size 227*227(AlexNet Input)
  • computes features for each proposal using a large CNN(ImageNet specific)

RCNN

  • Fine tuning: Replacing the CNN’s ImageNet specific 1000-way classification layer with a randomly initialized (N + 1)-way classification layer
  • IOU>0.5 labeled as target, otherwize labeled as background
  • extract a 4096-dimensional feature vector from each region proposal
  • classifies each region using class-specific linear SVM

Disadvantage

  • CNNs require a fixed input image size
  • Feature computation in RCNN is time-consuming and space-consuming (repeatedly applies the deep convolutional networks to the raw pixels of thousands of warped regions per image)

SPP-NET

Spatial pyramid pooling

reference: http://slideplayer.com/slide/5277459/

Spatial pyramid pooling

reference: http://slideplayer.com/slide/5277459/

Multiple-size training

http://slideplayer.com/slide/5277459/

http://sinb.github.io/Whatyouknowabout/spp-net-1/

  • Multiple networks sharing all weights.
  • Each network for single size
  • feature map size: a x a
  • pyramid level: n x n
  • pooling window size: ceil(a / n)
  • stride: floor(a / n)

 

2000 convolutions vs 1 convolution

The features can be computed from entire image only once

Much more faster!!

Made with Slides.com