Nonlinear mapping of massive data sets by fuzzy clustering and neural networks

1 March 2001

journal article
research article
Published by Wiley in Journal of Computational Chemistry

Vol. 22 (4), 373-386
https://doi.org/10.1002/1096-987x(200103)22:4<373::aid-jcc1009>3.0.co;2-8

Abstract

Producing good low-dimensional representations of high-dimensional data is a common and important task in many data mining applications. Two methods that have been particularly useful in this regard are multidimensional scaling and nonlinear mapping. These methods attempt to visualize a set of objects described by means of a dissimilarity or distance matrix on a low-dimensional display plane in a way that preserves the proximities of the objects to whatever extent is possible. Unfortunately, most known algorithms are of quadratic order, and their use has been limited to relatively small data sets. We recently demonstrated that nonlinear maps derived from a small random sample of a large data set exhibit the same structure and characteristics as that of the entire collection, and that this structure can be easily extracted by a neural network, making possible the scaling of data set orders of magnitude larger than those accessible with conventional methodologies. Here, we present a variant of this algorithm based on local learning. The method employs a fuzzy clustering methodology to partition the data space into a set of Voronoi polyhedra, and uses a separate neural network to perform the nonlinear mapping within each cell. We find that this local approach offers a number of advantages, and produces maps that are virtually indistinguishable from those derived with conventional algorithms. These advantages are discussed using examples from the fields of combinatorial chemistry and optical character recognition. © 2001 John Wiley & Sons, Inc. J Comput Chem 22: 373–386, 2001

Keywords

This publication has 23 references indexed in Scilit:

Replicator Neural Networks for Universal Optimal Source Coding
Science, 1995
USE OF MULTILAYER FEEDFORWARD NEURAL NETS AS A DISPLAY METHOD FOR MULTIDIMENSIONAL DISTRIBUTIONS
International Journal of Neural Systems, 1995
Principal components, minor components, and linear neural networks
Neural Networks, 1992
A Self-Organizing Network for Principal-Component Analysis
Europhysics Letters, 1989
Exploratory Projection Pursuit
Journal of the American Statistical Association, 1987
A Projection Pursuit Algorithm for Exploratory Data Analysis
IEEE Transactions on Computers, 1974
Maximum Likelihood Estimation of a Unimodal Density Function
The Annals of Mathematical Statistics, 1970
Adaptive Control Processes
Published by Walter de Gruyter GmbH ,1961
Multidimensional scaling: I. Theory and method
Psychometrika, 1952
Analysis of a complex of statistical variables into principal components.
Journal of Educational Psychology, 1933

Cited by 28 articles