Indic script identification from handwritten document images — An unconstrained block-level approach
- 1 July 2015
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE) in 2015 IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS)
Abstract
In a multi-script country like India, prior identification of script from document images is an essential step before choosing appropriate script specific OCR. The problem becomes more complex and challenging in case of HSI (Handwritten Script Identification). An automatic HSI technique for document images of six popular Indic scripts namely Bangla, Devanagari, Malayalam, Oriya, Roman and Urdu is proposed in this paper. A Block-level approach is followed for the same and initially 34-dimensional feature vector is constructed applying transform based (BRT, BDCT, BFFT and BDT), textural and statistical techniques. Finally using a GAS (Greedy Attribute Selection) method 20 attributes are selected for learning process. Total 600 unconstrained document image blocks of size 512×512 each, are prepared with equal distribution of each script type. The whole dataset is divided into 2:1 ratio for training and testing. Extensive experimentation is carried out for Six-scripts, Tetra-scripts, Tri-scripts and Bi-scripts combinations. Experimental result shows promising and comparable performance.Keywords
This publication has 15 references indexed in Scilit:
- Transform based approach for Indic script identification from handwritten document imagesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2015
- Automatic Handwritten Indian Scripts IdentificationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2014
- Directional Discrete Cosine Transform for Handwritten Script IdentificationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2013
- A System for Handwritten Script Identification From Indian DocumentJournal of Pattern Recognition Research, 2013
- Bangla/English Script Identification Based on Analysis of Connected Component ProfilesLecture Notes in Computer Science, 2006
- A system for word-wise handwritten script identification for Indian postal automationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Script-based classification of hand-written text documents in a multilingual environmentPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2004
- Identification of different script lines from multi-script documentsImage and Vision Computing, 2002
- Script and language identification for handwritten document imagesInternational Journal on Document Analysis and Recognition (IJDAR), 1999
- Automatic script identification from document images using cluster-based templatesIEEE Transactions on Pattern Analysis and Machine Intelligence, 1997