Urdu-Text Detection and Recognition in Natural Scene Images Using Deep Learning

Top Cited Papers

Open Access

12 May 2020

journal article
research article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Access

Vol. 8, 96787-96803
https://doi.org/10.1109/access.2020.2994214

Abstract

Urdu text is a cursive script and belongs to a non-Latin family of other cursive scripts like Arabic, Chinese, and Hindi. Urdu text poses a challenge for detection/localization from natural scene images, and consequently recognition of individual ligatures in scene images. In this paper, a methodology is proposed that covers detection, orientation prediction, and recognition of Urdu ligatures in outdoor images. As a first step, the custom FasterRCNN algorithm has been used in conjunction with well-known CNNs like Squeezenet, Googlenet, Resnet18, and Resnet50 for detection and localization purposes for images of size 320 × 240 pixels. For ligature Orientation prediction, a custom Regression Residual Neural Network (RRNN) is trained/tested on datasets containing randomly oriented ligatures. Recognition of ligatures was done using Two Stream Deep Neural Network (TSDNN). In our experiments, five-set of datasets, containing 4.2K and 51K Urdu-text-embedded synthetic images were generated using the CLE annotation text to evaluate different tasks of detection, orientation prediction, and recognition of ligatures. These synthetic images contain 132, and 1600 unique ligatures corresponding to 4.2K and 51K images respectively, with 32 variations of each ligature (4-backgrounds and font 8-color variations). Also, 1094 real-world images containing more than 12k Urdu characters were used for TSDNN's evaluation. Finally, all four detectors were evaluated and used to compare them for their ability to detect/localize Urdu-text using average-precision (AP). Resnet50 features based FasterRCNN was found to be the winner detector with AP of.98. While Squeeznet, Googlenet, Resnet18 based detectors had testing AP of.65, .88, and .87 respectively. RRNN achieved and accuracy of 79% and 99% for 4k and 51K images respectively. Similarly, for characters classification in ligatures, TSDNN attained a partial sequence recognition rate of 94.90% and 95.20% for 4k and 51K images respectively. Similarly, a partial sequence recognition rate of 76.60% attained for real world-images.

This publication has 51 references indexed in Scilit:

ICDAR 2015 competition on Robust Reading
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
Going deeper with convolutions
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
ImageNet Large Scale Visual Recognition Challenge
International Journal of Computer Vision, 2015
Text Detection and Recognition in Imagery: A Survey
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2014
Multilingual Artificial Text Detection Using a Cascade of Transforms
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2013
Scene text detection using graph model built upon maximally stable extremal regions
Pattern Recognition Letters, 2013
The Pascal Visual Object Classes (VOC) Challenge
International Journal of Computer Vision, 2009
ICDAR 2003 robust reading competitions
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
Dynamic programming algorithm optimization for spoken word recognition
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1978

Cited by 41 articles