Trainable script identification strategies for Indian languages
- 1 January 1999
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE) in Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318)
Abstract
Identification of the script in an image of a document page is of primary importance for a system processing multi-lingual documents. In this paper three trainable classification schemes have been proposed for identification of Indian scripts. The first scheme is based upon a frequency domain representation of the horizontal profile of the textual blocks. The other two schemes use connected components extracted from the textual region. We have proposed a novel Gabor filter-based feature extraction scheme for the connected components. We have also found that frequency distribution of the width-to-height ratio of the connected components can also be used for script recognition. It has been experimentally found that the Gabor filter-based scheme provides the most reliable performance. However, the other two techniques are computationally more efficient.Keywords
This publication has 4 references indexed in Scilit:
- Language identification for printed text independent of segmentationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Rotation invariant texture features and their use in automatic script identificationIEEE Transactions on Pattern Analysis and Machine Intelligence, 1998
- Determination of the script and language content of document imagesIEEE Transactions on Pattern Analysis and Machine Intelligence, 1997
- Automatic script identification from document images using cluster-based templatesIEEE Transactions on Pattern Analysis and Machine Intelligence, 1997