Script Identification from Printed Indian Document Images and Performance Evaluation Using Different Classifiers

Open Access

7 December 2014

journal article
research article
Published by Hindawi Limited in Applied Computational Intelligence and Soft Computing

Vol. 2014 (1), 1-12
https://doi.org/10.1155/2014/896128

Abstract

Identification of script from document images is an active area of research under document image processing for a multilingual/ multiscript country like India. In this paper the real life problem of printed script identification from official Indian document images is considered and performances of different well-known classifiers are evaluated. Two important evaluating parameters, namely, AAR (average accuracy rate) and MBT (model building time), are computed for this performance analysis. Experiment was carried out on 459 printed document images with 5-fold cross-validation. Simple Logistic model shows highest AAR of 98.9% among all. BayesNet and Random Forest model have average accuracy rate of 96.7% and 98.2% correspondingly with lowest MBT of 0.09 s.

Keywords

This publication has 16 references indexed in Scilit:

A System for Handwritten Script Identification From Indian Document
Journal of Pattern Recognition Research, 2013
A novel framework for automatic sorting of postal documents with multi-script address blocks
Pattern Recognition, 2010
Script Recognition—A Review
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010
Script Identification from Indian Documents
Lecture Notes in Computer Science, 2006
Neural network based system for script identification in Indian documents
Sādhanā, 2002
Script identification in printed bilingual documents
Sādhanā, 2002
Script and language identification for handwritten document images
International Journal on Document Analysis and Recognition (IJDAR), 1999
Differentiating Between Oriental and European Scripts by Statistical Features
International Journal of Pattern Recognition and Artificial Intelligence, 1998
Determination of the script and language content of document images
IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997
Automatic script identification from document images using cluster-based templates
IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997

Cited by 30 articles