Segmentation methods for character recognition: from segmentation to document structure analysis

Abstract
A pattern-oriented segmentation method for optical character recognition that leads to document structure analysis is presented. As a first example, segmentation of handwritten numerals that touch are treated. Connected pattern components are extracted, and spatial interrelations between components are measured and grouped into meaningful character patterns. Stroke shapes are analyzed and a method of finding the touching positions that separates about 95% of connected numerals correctly is described. Ambiguities are handled by multiple hypotheses and verification by recognition. An extended form of pattern-oriented segmentation, tabular form recognition, is considered. Images of tabular forms are analyzed, and frames in the tabular structure are extracted. By identifying semantic relationships between label frames and data frames, information on the form can be properly recognized.<>

This publication has 14 references indexed in Scilit: