Automatic script identification from document images using cluster-based templates

Abstract
We describe an automated script identification system for typeset document images. Templates for each script are created by clustering textual symbols from a training set. Symbols from new images are compared to the templates to find the best script. Our current system processes thirteen scripts with minimal preprocessing and high accuracy.

This publication has 7 references indexed in Scilit: