Blur hit-miss transform and its use in document image pattern detection

Abstract
The usefulness of the hit-miss transform (HMT) and related transforms for pattern matching in document image applications is examined. Although the HMT is sensitive to the types of noise found in scanned images, including both boundary and random noise, a simple extension, the blur HMT, is relatively robust. The noise immunity of the blur HMT derives from its ability to treat both types of noise together, and to remove them by appropriate dilations. In analogy with the Hausdorff metric for the distance between two sets, metric generalizations for special cases of the blur HMT are derived. Whereas Hausdorff uses both directions of the directed distances between two sets, a metric derived from a special case of the blur HMT uses just one direction of the directed distances between foreground and background components of two sets. For both foreground and background, the template is always the first of the directed sets. A less restrictive metric generalization, where the disjoint foreground and background components of the template need not be set complements, is also derived. For images with a random component of noise, the blur HMT is sensitive only to the size of the noise, whereas Hausdorff matching is sensitive to its location. It is also shown how these metric functions can be derived from the distance functions of the foreground and background of an image, using dilation by the appropriate templates. The blur HMT is implemented efficiently with Boolean image operations. The FG and BG images are dilated with structuring elements that depend on image noise and pattern variability, and the results are then eroded with templates derived from patterns to be matched. Subsampling the patterns on a regular grid can improve speed and maintain match quality, and examples are given that indicate how to explore the parameter space. The blur HMT can be used as a fast heuristic to avoid more expensive integer-based matching techniques. Truncated matches give the same result as full erosions and are much faster.