Indic script identification from handwritten document images — An unconstrained block-level approach

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE) in 2015 IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS)

p. 213-218
https://doi.org/10.1109/retis.2015.7232880

Abstract

In a multi-script country like India, prior identification of script from document images is an essential step before choosing appropriate script specific OCR. The problem becomes more complex and challenging in case of HSI (Handwritten Script Identification). An automatic HSI technique for document images of six popular Indic scripts namely Bangla, Devanagari, Malayalam, Oriya, Roman and Urdu is proposed in this paper. A Block-level approach is followed for the same and initially 34-dimensional feature vector is constructed applying transform based (BRT, BDCT, BFFT and BDT), textural and statistical techniques. Finally using a GAS (Greedy Attribute Selection) method 20 attributes are selected for learning process. Total 600 unconstrained document image blocks of size 512×512 each, are prepared with equal distribution of each script type. The whole dataset is divided into 2:1 ratio for training and testing. Extensive experimentation is carried out for Six-scripts, Tetra-scripts, Tri-scripts and Bi-scripts combinations. Experimental result shows promising and comparable performance.

Keywords

This publication has 15 references indexed in Scilit:

Transform based approach for Indic script identification from handwritten document images
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
Automatic Handwritten Indian Scripts Identification
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2014
Directional Discrete Cosine Transform for Handwritten Script Identification
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2013
A System for Handwritten Script Identification From Indian Document
Journal of Pattern Recognition Research, 2013
Bangla/English Script Identification Based on Analysis of Connected Component Profiles
Lecture Notes in Computer Science, 2006
A system for word-wise handwritten script identification for Indian postal automation
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
Script-based classification of hand-written text documents in a multilingual environment
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2004
Identification of different script lines from multi-script documents
Image and Vision Computing, 2002
Script and language identification for handwritten document images
International Journal on Document Analysis and Recognition (IJDAR), 1999
Automatic script identification from document images using cluster-based templates
IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997

Cited by 11 articles