A Sentence-to-Sentence Clustering Procedure for Pattern Analysis

Abstract
Cluster analysis for patterns represented by sentences is investigated. The similarity between patterns is expressed in terms of the distance between their corresponding sentences. A weighted distance between two strings is defined and its probabilistic interpretation given. The class membership of an input pattern (sentence) is determined according to the nearest neighbor or k-nearest neighbor rule. A clustering procedure on a sentence-to-sentence basis is proposed. A set of English characters is used to illustrate the proposed metric and clustering procedure.

This publication has 8 references indexed in Scilit: