(searched for: Recognition of Automated Hand-written Digits on Document Images Making Use of Machine Learning Techniques)
Published: 6 April 2021
European Journal of Engineering and Technology Research, Volume 6, pp 37-44; doi:10.24018/ejers.2021.6.4.2460
The purpose of this study is to create an automated framework that can recognize similar handwritten digit strings. For starting the experiment, the digits were separated into different numbers. The process of defining handwritten digit strings is then concluded by recognizing each digit recognition module's segmented digit. This research utilizes various machine learning techniques to produce a strong performance on the digit string recognition challenge, including SVM, ANN, and CNN architectures. These approaches use SVM, ANN, and CNN models of HOG feature vectors to train images of digit strings. Deep learning methods organize the pictures by moving a fixed-size monitor over them while categorizing each sub-image as a digit pass or fail. Following complete segmentation, complete recognition of handwritten digits is accomplished. To assess the methods' results, data must be used for machine learning training. Following that, the digit data is evaluated using the desired machine learning methodology. The Experiment findings indicate that SVM and ANN also have disadvantages in precision and efficiency in text picture recognition. Thus, the other process, CNN, performs better and is more accurate. This paper focuses on developing an effective system for automatically recognizing handwritten digits. This research would examine the adaptation of emerging machine learning and deep learning approaches to various datasets, like SVM, ANN, and CNN. The test results undeniably demonstrate that the CNN approach is significantly more effective than the ANN and SVM approaches, ranking 71% higher. The suggested architecture is composed of three major components: image pre-processing, attribute extraction, and classification. The purpose of this study is to enhance the precision of handwritten digit recognition significantly. As will be demonstrated, pre-processing and function extraction are significant elements of this study to obtain maximum consistency.
Published: 19 September 2018
Advances in Intelligent Systems and Computing pp 589-596; doi:10.1007/978-981-13-1822-1_55
The publisher has not yet granted permission to display this abstract.
Published: 10 March 2016
Applied Computational Intelligence and Soft Computing, Volume 2016, pp 1-17; doi:10.1155/2016/2796863
Handwritten digit recognition plays a significant role in many user authentication applications in the modern world. As the handwritten digits are not of the same size, thickness, style, and orientation, therefore, these challenges are to be faced to resolve this problem. A lot of work has been done for various non-Indic scripts particularly, in case of Roman, but, in case of Indic scripts, the research is limited. This paper presents a script invariant handwritten digit recognition system for identifying digits written in five popular scripts of Indian subcontinent, namely, Indo-Arabic, Bangla, Devanagari, Roman, and Telugu. A 130-element feature set which is basically a combination of six different types of moments, namely, geometric moment, moment invariant, affine moment invariant, Legendre moment, Zernike moment, and complex moment, has been estimated for each digit sample. Finally, the technique is evaluated on CMATER and MNIST databases using multiple classifiers and, after performing statistical significance tests, it is observed that Multilayer Perceptron (MLP) classifier outperforms the others. Satisfactory recognition accuracies are attained for all the five mentioned scripts. 1. IntroductionThe field of automated reading of printed or handwritten documents by the electronic devices is known as Optical Character Recognition (OCR) system, which is broadly defined as the process of recognizing either printed or handwritten text from document images and converting it into electronic form. OCR systems can contribute tremendously to the advancement of the automation process and can improve the interaction between man and machine in many applications, including office automation, bank check verification, postal automation, and a large variety of business and data entry applications. Handwritten digit recognition is the method of recognizing and classifying handwritten digits from 0 to 9 without human interaction . Although the recognition of handwritten numerals has been studied for more than three decades and many techniques with high accuracy rates have already been developed, the research in this area continues with the aim of improving the recognition rates further.Handwritten digit recognition is a complex problem due to the fact that variation exists in writing style of different writers. The phenomenon that makes the problem more challenging is the inherent variation in writing styles at different instances. Due to this reason, building a generic recognizer that is capable of recognizing handwritten digits written by diverse writers is not always feasible . However, the extraction of the most informative features with highly discriminatory ability to improve the classification accuracy with reduced complexity remains one of the most important problems for this task. It is a task of great importance for which there are standard databases that allow different approaches to be compared and validated.India is a multilingual country with 23 constitutionally recognized languages written in 12 major scripts . Besides these, hundreds of other languages are used in India, each one with a number of dialects. The officially recognized languages are Hindi, Bengali, Punjabi, Marathi, Gujarati, Oriya, Sindhi, Assamese, Nepali, Urdu, Sanskrit, Tamil, Telugu, Kannada, Malayalam, Kashmiri, Manipuri, Konkani, Maithili, Santhali, Bodo, English, and Dogri. The 12 major scripts used to write these languages are Devanagari, Bangla, Oriya, Gujarati, Gurumukhi, Tamil, Telugu, Kannada, Malayalam, Manipuri, Roman, and Urdu. In a multilingual country like India, it is a common scenario that a document like job application form, railway ticket reservation form, and so forth is composed of text contents written in different languages/scripts in order to reach a larger cross section of people. The variation of different scripts may be in the form of numerals or alpha numerals in a single document page. But the techniques developed for text identification generally do not incorporate the recognition of digits. This is because the features required for the text identification may not be applicable for identifying the digits.The paper is organized as follows: Section 2 presents a brief review of some of the previous approaches to handwritten digit recognition whereas, in Section 3, we introduce our script independent handwritten digit recognition system. Section 4 describes the performance of our system on realistic databases of handwritten digits and, finally, Section 5 concludes the paper.2. Review of Related WorksGorgevik and Cakmakov  developed Support Vector Machine (SVM) based digits recognition system for handwritten Roman numerals. They extracted four types of features from each digit image: (1) projection histograms, (2) contour profiles, (3) ring-zones, and (4) Kirsch features. They reported 97.27% recognition accuracy on National Institute of Standards and Technology (NIST) handwritten digits database . In , Chen et al. proposed max-min posterior pseudoprobabilities framework for Roman handwritten digit recognition. They extracted 256 dimension directional features from the input image. Finally, these features were transformed into a set of 128 features using Principal Component Analysis (PCA). They reported recognition accuracy of 98.76% on NIST database . Labusch et al.  described a sparse coding based feature extraction method with SVM as a classifier. They found recognition accuracy of 99.41% on MNIST (Modified NIST) handwritten digits database . The work described in  combined three recognizers by majority vote, and one of them is based on Kirsch gradient (four orientations), dimensionality reduction by PCA, and classification by SVM. They achieved an accuracy rate of 95.05% with 0.93% error on 10,000 test samples of MNIST database . Mane and Ragha  performed handwritten digit recognition using elastic image matching technique based on eigendeformation, which is estimated by the PCA of actual deformations automatically selected by the elastic matching. They achieved an overall accuracy of 94.91% on their own database collected from different individuals of various professions for the experiment. Cruz et al.  presented a handwritten digit recognition system which uses multiple feature extraction methods and classifier ensemble. A total of six feature extraction algorithms, namely, Multizoning, Modified Edge Maps, Structural Characteristics, Projections, Concavities Measurements, and Gradient Directional, were evaluated in this paper. A scheme using neural networks as a combiner achieved a recognition rate of 99.68% on a training set of 60,000 images and a test set of 10,000 images of MNIST database.Dhandra et al.  investigated a script independent automatic numeral recognition system for recognition of Kannada, Telugu, and Devanagari handwritten numerals. In the proposed method, 30 classes were reduced to 18 classes by extracting the global and local structural features like directional density estimation, water reservoirs, maximum profile distances, and fill-hole density. Finally, a probabilistic neural network (PNN) classifier was used for the recognition system which yielded an accuracy of 97.20% on a total of 2550 numeral images written in Kannada, Telugu, and Devanagari scripts. In , Yang et al. proposed supervised matrix factorization method used directly as multiclass classifier. They reported recognition accuracy of 98.71% with supervised learning approach on MNIST database . In , a mixture of multiclass logistic regression models was described. They claimed recognition accuracy of 98% on the Indian digit database provided by CENPARMI . Das et al.  described a technique for creating a pool of local regions and selection of an optimal set of local regions from that pool for extracting optimal discriminating information for handwritten Bangla digit recognition. Genetic algorithm (GA) was then applied on these local regions to sample the best discriminating features. The features extracted from these selected local regions were then classified with SVM and recognition accuracy of 97% was achieved. In , a wavelet analysis based technique for feature extraction was reported. For classification, SVM and k-Nearest Neighbor (k-NN) were used and an overall recognition accuracy of 97.04% was reported on MNIST digit database . A comparative study in  was conducted by training the neural network using Backpropagation (BP) algorithm and further using PCA for feature extraction. Digit recognition was finally carried out using 13 algorithms, neural network algorithm, and the Fisher Discriminant Analysis (FDA) algorithm. The FDA algorithm proved less efficient with an overall accuracy of 77.67%, whereas the BP algorithm with PCA for its feature extraction gave an accuracy of 91.2%.In , a set of structural features (namely, number of holes, water reservoirs in four directions, maximum profile distances in four directions, and fill-hole density) and k-NN classifier were employed for classification and recognition of handwritten digits. They reported recognition accuracy of 96.94% on 5000 samples of MNIST digit database . In , AlKhateeb and Alseid proposed an Arabic handwritten digit recognition system using Dynamic Bayesian Network. They employed DCT coefficients based features for classification. The system was tested on Indo-Arabic digits database (ADBase) which contains 70,000 Indo-Arabic digits  and an average recognition accuracy of 85.26% was achieved on 10,000 samples. Ebrahimzadeh and Jampour  proposed an appearance feature-based approach using Histogram of Oriented Gradients (HOG) for handwritten digit recognition. A linear SVM was then used for classification of the digits in MNIST dataset and an overall accuracy of 97.25% had been realized. Gil et al.  presented a novel approach using SVM binary classifiers and unbalan