Automatic script identification from document images using cluster-based templates

1 February 1997

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Pattern Analysis and Machine Intelligence

Vol. 19 (2), 176-181
https://doi.org/10.1109/34.574802

Abstract

We describe an automated script identification system for typeset document images. Templates for each script are created by clustering textual symbols from a training set. Symbols from new images are compared to the templates to find the best script. Our current system processes thirteen scripts with minimal preprocessing and high accuracy.

This publication has 7 references indexed in Scilit:

Stress assignment in letter to sound rules for speech synthesis
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
Automatic script identification from images using cluster-based templates
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Determination of the script and language content of document images
IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997
Gauging Similarity with n -Grams: Language-Independent Categorization of Text
Science, 1995
Text characterization by connected component transformations
Published by SPIE-Intl Soc Optical Eng ,1994
Language determination
Published by Association for Computational Linguistics (ACL) ,1994
An integrated data flow visual language and software development environment
Journal of Visual Languages & Computing, 1991

Cited by 137 articles