IIIT Hyderabad Publications |
|||||||||
|
An Empirical Study of Effectiveness of Post-processing in Indic ScriptsAuthors: Vinitha VS,Minesh Mathew, C V Jawahar Conference: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR-2017 2017) Location Kyoto, Japan Date: 2017-11-09 Report no: IIIT/TR/2017/53 AbstractThis paper explores the effectiveness of statistical language model ( SLM ) and dictionary based methods for detection and correction of errors in Indic OCR output. In SLM , we use unicode level ngrams for building the language model. We compare its performance with akshara level ngrams and find that akshara level ngrams perform better in detecting the errors when compared to unicode level ngrams. We experimentally analyze the performance of Indic OCR post-processing using dictionary method, compare the performance with English and analyze the reasons for the under-performance in Indic scripts. We use four major Indian languages for our experiments, namely Hindi, Gurumukhi, Telugu and Malayalam. Full paper: pdf Centre for Visual Information Technology |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |