OCR is the conversion of typed or handwritten text into machine-encoded text. It is one of the hardest problems to solve in computer vision and is still an active area of research with no one standard model.
OCR has wide-ranging implications in many industries.
Anywhere there is a need to convert handwritten text into machine-encoded text, OCR can be used to reduce errors and increase speed.
CNN Recap
Instead of
“MOVE”
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0
Learning Attributes Instead of Words – Unigram Approach
Consider the case of a training set of size 1,000. The word “SLEEP” may appear only twice, but attributes such as “does the word contain the unigram ‘S’ in the first half of the word could occur multiple times – which is great for CNN's.
Similar words may confuse the network - Consider the words “KIDS” and “BIDS”. A “KIDS” word image is a negative sample for the “BIDS” category, although a large part of their appearance is shared. This similarity between some categories makes a category based classifier harder to learn, whereas an attributes based classifier uses this to its advantage.
657 writers contributed samples of their handwriting
1'539 pages of scanned text
5'685 isolated and labeled sentences
13'353 isolated and labeled text lines
115'320 isolated and labeled words