Script Independent Scene Text Segmentation using Fast Stroke Width Transform and GrabCut

Jay H. Bosamiya, Palash Agrawal,
Partha Pratim Roy, R. Balasubramanian

What?!?

Basically

The system can point out areas of interest for text, even if it hasn't seen the language before

But why?

Multiple reasons

  • Existing segmentation techniques worked only with English
  • Existing techniques could be "taught" new languages, but then they would recognize only those specific languages
  • India is vast enough, that we need something generalized
  • It should work well, even on languages it has never seen before

 

  • We have good OCR techniques but they work only on Black and White images, not scene images

Previous Work

  • Bottom up (using region based techniques)
    • Phan et al
    • Neumann and Matas
  • Filter from top to bottom (using texture properties)
    • Yi and Tian
    • Yao et al

 

  • Still, above work only on English
  • We combined ideas from both, and also made it independent of language/script

Problems?

Text Like Patterns

Some urban patterns are common but not text

Tree Foliage

Nature also likes to mess
with us!

Glare

Cameras are messy pieces of equipment!

So HOW did we solve it?

Image

Finding Maximally Stable Extremal Regions (MSERs)

Localized Adaptive Binarization

Stroke Width Transform

Text/Non-Text Classifier

GrabCut

Fast Stroke Width Transform

What next?

Experimentation

Tested on

  • ICDAR 2013 Robust Reading Competition Dataset (for English)
    • 233 images
    • Performs close enough to pre-existing techniques
  • Custom Dataset (for other languages)
    • Performs with similar accuracy on Indian Languages dataset, without ever being trained on Indian Languages

Experimentation Platform

  • Designed a generic framework for doing modular experimentation
  • Parameter optimization done on the PARAM cluster (Thanks ICC!)
  • Code written in C++ using OpenMP (for parallelism on the cluster)
  • OpenCV used for handling low level image manipulation
  • Custom Genetic Algorithm implementation for parameter optimization (alongside manual optimization, and grid search)

Fast SWT

Performs (on average) 79% faster than Stroke Width Operator described by Chen et al

when run against full dataset

 

Additionally, gives more accurate measurement (fractional stroke widths are supported by Fast SWT, as compared to the purely quantized integer stroke widths in previous algorithms)

On English Scene Segmentation

Graphs!

Different stages of the technique

Error Analysis

Conclusions

  • Possible to do Text Segmentation independent of language/script
  • Works as good on Indian Languages as on English
  • Designed a new FastSWT algorithm, which outperforms state-of-the-art by 79% in speed, and is more accurate and precise

Future Work

Future Work

  • Combine with recognition, to improve quality of both segmentation and recognition
  • Change technique to handle single-connected-component words (for example, by removing the top bar in Devanagiri script)
  • Improve it for handwritten/cursive font (which has high degree of connectivity throughout word)
  • Curved Text (requires non-horizontal bounding box in GrabCut stage)

Questions?

Scene Text Segmentation

By jaybosamiya

Scene Text Segmentation

  • 522