Avoiding Informational Distortion in Automatic Grouping Programs

1 September 1969

journal article
Published by Oxford University Press (OUP) in Systematic Zoology

Vol. 18 (3), 318-329
https://doi.org/10.2307/2412328

Abstract

Some numerical methods for group-forming are discussed in relation to the important informational requirements of classification. Departures from these requirements are termed informational distortions. Such distortions seem to be almost absent for Relative Heterogeneity and Homogeneity Functions, which are described in detail. However, they may reach serious proportions in the product-moment correlation coefficient, Jaccard's and Czekanowski's similarity coefficients, and mean Euclidean distance. Care is needed to avoid the informational distortions of scatter diagrams and models. Further distortions and losses of information are pointed out for the clustering systems using single, median and group average linkage. Such difficulties are avoided with average member linkage, which is suitable for sets derived from poor sampling (open arrays) where a ‘looser’, space-conserved classification is needed. Probably minimal space-conservation is given with group homogeneity as the test for clustering. This is suited to sets where the sampling has been effectively complete (closed arrays). A criticism of some aspects of principal components analysis, and some notes on interpreting dendrograms are included.

Keywords

Cited by 27 articles