Unconstrained Scene Text and Video Text Recognition for Arabic Script

Authors: Mohit Jain,Minesh Mathew, C V Jawahar
Conference: 1st International Workshop on Arabic Script Analysis and Recognition (ASAR-2017 2017)
Location Nancy, France
Date: 2017-04-03
Report no: IIIT/TR/2017/31

Abstract

Building robust recognizers for Arabic has always been challenging. We demonstrate the effectiveness of an end-to-end trainable CNN - RNN hybrid architecture in recognizing Arabic text in videos and natural scenes. We outperform previous state-of-the-art on two publicly available video text datasets – ALIF and ACTIV . For the scene text recognition task, we introduce a new Arabic scene text dataset and establish baseline results. For scripts like Arabic, a major challenge in developing robust recognizers is the lack of large quantity of annotated data. We overcome this by synthesizing millions of Arabic text images from a large vocabulary of Arabic words and phrases. Our implementation is built on top of the model introduced here [37] which is proven quite effective for English scene text recognition. The model follows a segmentation-free, sequence to sequence transcription approach. The network transcribes a sequence of convolutional features from the input image to a sequence of target labels. This does away with the need for segmenting input image into constituent characters/glyphs, which is often difficult for Arabic script. Further, the ability of RNN s to model contextual dependencies yields superior recognition results.

Full paper: pdf

Centre for Visual Information Technology

IIIT Hyderabad Publications

Unconstrained Scene Text and Video Text Recognition for Arabic Script

Abstract