A Sentence-to-Sentence Clustering Procedure for Pattern Analysis

Abstract

Cluster analysis for patterns represented by sentences is investigated. The similarity between patterns is expressed in terms of the distance between their corresponding sentences. A weighted distance between two strings is defined and its probabilistic interpretation given. The class membership of an input pattern (sentence) is determined according to the nearest neighbor or k-nearest neighbor rule. A clustering procedure on a sentence-to-sentence basis is proposed. A set of English characters is used to illustrate the proposed metric and clustering procedure.

Keywords

This publication has 8 references indexed in Scilit:

A new method for error correction in strings with applications to spoken word recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
A Clustering Procedure for Syntactic Patterns
IEEE Transactions on Systems, Man, and Cybernetics, 1977
Decoding for channels with insertions, deletions, and substitutions with applications to speech recognition
IEEE Transactions on Information Theory, 1975
Maximum-likelihood syntactic decoding
IEEE Transactions on Information Theory, 1975
The String-to-String Correction Problem
Journal of the ACM, 1974
A Minimum Distance Error-Correcting Parser for Context-Free Languages
SIAM Journal on Computing, 1972
A formal picture description scheme as a basis for picture processing systems
Information and Control, 1969
On the Encoding of Arbitrary Geometric Configurations
IEEE Transactions on Electronic Computers, 1961

Cited by 147 articles