Trainable script identification strategies for Indian languages

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE) in Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318)

p. 657-660
https://doi.org/10.1109/icdar.1999.791873

Abstract

Identification of the script in an image of a document page is of primary importance for a system processing multi-lingual documents. In this paper three trainable classification schemes have been proposed for identification of Indian scripts. The first scheme is based upon a frequency domain representation of the horizontal profile of the textual blocks. The other two schemes use connected components extracted from the textual region. We have proposed a novel Gabor filter-based feature extraction scheme for the connected components. We have also found that frequency distribution of the width-to-height ratio of the connected components can also be used for script recognition. It has been experimentally found that the Gabor filter-based scheme provides the most reliable performance. However, the other two techniques are computationally more efficient.

Keywords

This publication has 4 references indexed in Scilit:

Language identification for printed text independent of segmentation
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Rotation invariant texture features and their use in automatic script identification
IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998
Determination of the script and language content of document images
IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997
Automatic script identification from document images using cluster-based templates
IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997

Cited by 25 articles