Script Independent Scene Text Segmentation using Fast Stroke Width Transform and GrabCut

Jay H. Bosamiya, Palash Agrawal,
Partha Pratim Roy, R. Balasubramanian

What?!?

Basically

The system can point out areas of interest for text, even if it hasn't seen the language before

But why?

Multiple reasons

Existing segmentation techniques worked only with English
Existing techniques could be "taught" new languages, but then they would recognize only those specific languages
India is vast enough, that we need something generalized
It should work well, even on languages it has never seen before

We have good OCR techniques but they work only on Black and White images, not scene images

Previous Work

Bottom up (using region based techniques)
- Phan et al
- Neumann and Matas
Filter from top to bottom (using texture properties)
- Yi and Tian
- Yao et al

Still, above work only on English
We combined ideas from both, and also made it independent of language/script

Problems?

Text Like Patterns

Some urban patterns are common but not text

Tree Foliage

Nature also likes to mess
with us!

Glare

Cameras are messy pieces of equipment!

So HOW did we solve it?

Image

Finding Maximally Stable Extremal Regions (MSERs)

Localized Adaptive Binarization

Stroke Width Transform

Text/Non-Text Classifier

GrabCut

Fast Stroke Width Transform

What next?

Experimentation

Tested on

ICDAR 2013 Robust Reading Competition Dataset (for English)
- 233 images
- Performs close enough to pre-existing techniques
Custom Dataset (for other languages)
- Performs with similar accuracy on Indian Languages dataset, without ever being trained on Indian Languages

Experimentation Platform

Designed a generic framework for doing modular experimentation
Parameter optimization done on the PARAM cluster (Thanks ICC!)
Code written in C++ using OpenMP (for parallelism on the cluster)
OpenCV used for handling low level image manipulation
Custom Genetic Algorithm implementation for parameter optimization (alongside manual optimization, and grid search)

Fast SWT

Performs (on average) 79% faster than Stroke Width Operator described by Chen et al

when run against full dataset

Additionally, gives more accurate measurement (fractional stroke widths are supported by Fast SWT, as compared to the purely quantized integer stroke widths in previous algorithms)

On English Scene Segmentation

Graphs!

Different stages of the technique

Error Analysis

Conclusions

Possible to do Text Segmentation independent of language/script
Works as good on Indian Languages as on English
Designed a new FastSWT algorithm, which outperforms state-of-the-art by 79% in speed, and is more accurate and precise

Future Work

Combine with recognition, to improve quality of both segmentation and recognition
Change technique to handle single-connected-component words (for example, by removing the top bar in Devanagiri script)
Improve it for handwritten/cursive font (which has high degree of connectivity throughout word)
Curved Text (requires non-horizontal bounding box in GrabCut stage)

Questions?

Scene Text Segmentation

By jaybosamiya

Script Independent Scene Text Segmentation using Fast Stroke Width Transform and GrabCut

What?!?

Basically

But why?

Multiple reasons

Previous Work

Problems?

Text Like Patterns

Tree Foliage

Glare

So HOW did we solve it?

Image

Finding Maximally Stable Extremal Regions (MSERs)

Localized Adaptive Binarization

Stroke Width Transform

Text/Non-Text Classifier

GrabCut

Fast Stroke Width Transform

What next?

Tested on

Experimentation Platform

Fast SWT

On English Scene Segmentation

Graphs!

Error Analysis

Conclusions

Future Work

Future Work

Questions?

Scene Text Segmentation

More from jaybosamiya