Statistical Text Line Analysis in Handwritten Documents
- 1 September 2012
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 201-206
- https://doi.org/10.1109/icfhr.2012.274
Abstract
In this paper we present an approach for text line analysis and detection in handwritten documents based on Hidden Markov Models, a technique widely used in other handwritten and speech recognition tasks. It is shown that text line analysis and detection can be solved using a more formal methodology in contraposition to most of the proposed heuristic approaches found in the literature. Our approach not only provides the best position coordinates for each of the vertical page regions but also labels them, in this manner surpassing the traditional heuristic methods. In our experiments we demonstrate the performance of the approach (both in line analysis and detection) and study the impact of increasingly constrained "vertical layout language models" on text line detection accuracy. Through this experimentation we also show the improvement in quality of the baselines yielded by our approach in comparison with a state-of-the-art heuristic method based on vertical projection profiles.Keywords
This publication has 13 references indexed in Scilit:
- A prototype for interactive speech transcription balancing error and supervision effortPublished by Association for Computing Machinery (ACM) ,2012
- Multimodal interactive transcription of text imagesPattern Recognition, 2010
- Computer Assisted Transcription for Ancient Text ImagesLecture Notes in Computer Science, 2007
- Text line segmentation of historical documents: a surveyInternational Journal on Document Analysis and Recognition (IJDAR), 2006
- INTEGRATED HANDWRITING RECOGNITION AND INTERPRETATION USING FINITE-STATE MODELSInternational Journal of Pattern Recognition and Artificial Intelligence, 2004
- Offline recognition of unconstrained handwritten texts using HMMs and statistical language modelsIeee Transactions On Pattern Analysis and Machine Intelligence, 2004
- Script-independent, HMM-based text line finding for OCRPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- An omnifont open-vocabulary OCR system for English and ArabicIeee Transactions On Pattern Analysis and Machine Intelligence, 1999
- Scale Space Technique for Word Segmentation in Handwritten DocumentsLecture Notes in Computer Science, 1999
- Repulsive attractive network for baseline extraction on document imagesSignal Processing, 1999