Script independent feature set for handwritten text recognition

Abstract
The efficiency of any character recognition technique is directly dependent on the accuracy of the generated feature set which could uniquely represent a character and hence correctly recognize it. This paper proposes a hybrid approach combining the structural features of the character and a mathematical model of curve fitting to simulate the best features of a character. As a preprocessing step the character is binarized and transformed to a thinned skeleton and the spurious edges are removed. Then, a combination of structural features of the character like number of end points, loops and intersection points are calculated. Further, the thinned character image is statistically zoned into partitions and quadratic curve fitting model is applied on each partition forming a feature vector of coefficients of the curve. This vector is combined with the spatial distribution of the foreground pixels for each zone and hence script independent feature representation. The approach has been evaluated experimentally on English and Hindi scripts. The algorithm achieves as average recognition accuracy of 89% for any script without incorporating any script specific features.

This publication has 7 references indexed in Scilit: