HIERARCHIC AGGLOMERATIVE CLUSTERING METHODS FOR AUTOMATIC DOCUMENT CLASSIFICATION
- 1 March 1984
- journal article
- Published by Emerald in Journal of Documentation
- Vol. 40 (3), 175-205
- https://doi.org/10.1108/eb026764
Abstract
This paper considers the classifications produced by application of the single linkage, complete linkage, group average and Ward clustering methods to the Keen and Cranfield document test collections. Experiments were carried out to study the structure of the hierarchies produced by the different methods, the extent to which the methods distort the input similarity matrices during the generation of a classification, and the retrieval effectiveness obtainable in cluster based retrieval. The results would suggest that the single linkage method, which has been used extensively in previous work on document clustering, is not the most effective procedure of those tested, although it should be emphasized that the experiments have used only small document test collections.This publication has 44 references indexed in Scilit:
- A Method for Comparing Two Hierarchical ClusteringsJournal of the American Statistical Association, 1983
- A comparison of some hierarchal agglomerative clustering algorithms for structure—property correlationAnalytica Chimica Acta, 1982
- Cluster validity profilesPattern Recognition, 1982
- A model of cluster searching based on classificationInformation Systems, 1980
- The limited value of cophenetic correlation as a clustering criterionPattern Recognition, 1978
- On the similarity of dendrogramsJournal of Theoretical Biology, 1978
- Clustering techniques: The user's dilemmaPattern Recognition, 1976
- Objective Criteria for the Evaluation of Clustering MethodsJournal of the American Statistical Association, 1971
- Dendrogram TopologySystematic Zoology, 1971
- A Coefficient of Agreement for Nominal ScalesEducational and Psychological Measurement, 1960