IIIT Hyderabad Publications |
|||||||||
|
Handwritten Word Recognition for Indic & Latin scripts using Deep CNN-RNN Hybrid NetworksAuthor: kartik Dutta Date: 2019-03-25 Report no: IIIT/TH/2019/15 Advisor:C V Jawahar AbstractHandwriting recognition (HWR) in Indic scripts is a challenging problem due to the inherent subtleties in the scripts, cursive nature of the handwriting and similar shape of the characters. Though a lot of research has been done in the field of text recognition, the focus of the vision community has been primarily on English. Furthermore, a lack of publicly available handwriting datasets in Indic scripts has also affected the development of handwritten word recognizers and made direct comparisons across different methods an impossible task in the field. Also, due to this lack annotated data, it becomes challenging to train deep neural networks which contain millions of parameters. These facts are quite surprising considering the fact that there are over 400 million Devanagari speakers in India alone. We first tackle the problem of lack of annotated data using various approaches. We describe a framework for annotating large scale of handwritten word images without the need for manual segmentation and label re-alignment. Two new word level handwritten datasets for Telugu and Devanagari are released which were created using the above mentioned framework. We synthesize synthetic datasets containing millions of realistic images with a large vocabulary for the purpose of pre-training using publicly available Unicode fonts. Later on, pre-training using data from Latin script is also shown to be useful to overcome the shortage of data.Capitalizing on the success of the CNN-RNN Hybrid architecture, we propose various improvements in the architecture and it’s training pipeline to make it even more robust for the purposes of handwriting recognition. We now change the network to use a Resnet-18 like structure for the convolutional part along with adding a spatial transformer network layer. We also use an extensive data augmentation scheme involving multi-scale, elastic, affine and test time distortion. We outperform the previous state-of-the-art methods on existing benchmark datasets for both Latin and Indic scripts by quite some margin. We perform an ablation study to empirically show how the various changes we made to the original CNN-RNN Hybrid network have improved its performance with respect to handwriting recognition. We dive deeper into the working of our networks convolutional layers and verify the robustness of convolutional-features through layer visualizations. We hope the release of the two datasets mentioned in this work along with the architecture and training techniques that we have used instill interest among fellow researchers of the field. Full thesis: pdf Centre for Visual Information Technology |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |