Segmentation of Historical Handwritten Documents into Text Zones and Text Lines
- 1 September 2014
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE) in 2014 14th International Conference on Frontiers in Handwriting Recognition
- p. 464-469
- https://doi.org/10.1109/icfhr.2014.84
Abstract
In order to achieve accurate text recognition performance for historical handwritten document images, robust and efficient page segmentation is necessary. In this paper, we propose a text zone detection followed by a text line segmentation method suitable for historical handwritten documents. Our aim is to handle several challenging cases such as horizontal and vertical rule lines overlapping with the text, two column documents and characters of different text lines touching vertically. For text zone detection, we analyze vertical rule lines, connected components as well as vertical white runs while for text line segmentation, we enhance an existing approach based on Hough transform in order to better treat cases of vertical connected characters. Both methods have been proved very promising after an evaluation using a set of historical handwritten documents.Keywords
This publication has 20 references indexed in Scilit:
- Text line extraction for historical document imagesPattern Recognition Letters, 2014
- ICDAR 2013 Handwriting Segmentation ContestPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2013
- Text line and word segmentation of handwritten documentsPattern Recognition, 2009
- Handwritten Chinese text line segmentation by clustering with distance metric learningPattern Recognition, 2009
- Layout Analysis of Handwritten Historical Documents for Searching the Archive of the Cabinet of the Dutch QueenNinth International Conference on Document Analysis and Recognition (ICDAR 2007), 2007
- An Efficient Word Segmentation Technique for Historical and Degraded Machine-Printed DocumentsNinth International Conference on Document Analysis and Recognition (ICDAR 2007), 2007
- Text line extraction from multi-skewed handwritten documentsPattern Recognition, 2007
- Text line segmentation of historical documents: a surveyInternational Journal on Document Analysis and Recognition (IJDAR), 2006
- Adaptive degraded document image binarizationPattern Recognition, 2006
- Block segmentation and text extraction in mixed text/image documentsComputer Graphics and Image Processing, 1982