Influence of text line segmentation in Handwritten Text Recognition

Abstract
Text line segmentation is the process by which text lines in a document image are localized and extracted. It is an important step in off-line Handwritten Text Recognition (HTR) given that the input of these systems is the line image of the text to be transcribed. A myriad of solutions to the text line segmentation problem have been proposed in the literature. Although these solutions may differ greatly on what is actually applied to perform the segmentation, they can be classified by the level of precision and detail in the final extracted lines. In this paper we study the influence and real needs of different levels of precision and detail in the segmentation solutions in a real HTR task. We test three technics of text line segmentation whose output range from a simple rectangle for each line to a perfect fitted polygon surrounding the detected lines. Experiments have been carried out with a historical collection and results show that good HTR accuracy can be obtained with simple extraction algorithms.

This publication has 15 references indexed in Scilit: