Segmentation of Historical Handwritten Documents into Text Zones and Text Lines

Abstract
In order to achieve accurate text recognition performance for historical handwritten document images, robust and efficient page segmentation is necessary. In this paper, we propose a text zone detection followed by a text line segmentation method suitable for historical handwritten documents. Our aim is to handle several challenging cases such as horizontal and vertical rule lines overlapping with the text, two column documents and characters of different text lines touching vertically. For text zone detection, we analyze vertical rule lines, connected components as well as vertical white runs while for text line segmentation, we enhance an existing approach based on Hough transform in order to better treat cases of vertical connected characters. Both methods have been proved very promising after an evaluation using a set of historical handwritten documents.

This publication has 20 references indexed in Scilit: