OPTICS

1 June 1999

journal article
Published by Association for Computing Machinery (ACM) in ACM SIGMOD Record

Vol. 28 (2), 49-60
https://doi.org/10.1145/304181.304187

Abstract

Cluster analysis is a primary method for database mining. It is either used as a stand-alone tool to get insight into the distribution of a data set, e.g. to focus further analysis and data processing, or as a preprocessing step for other algorithms operating on the detected clusters. Almost all of the well-known clustering algorithms require input parameters which are hard to determine but have a significant influence on the clustering result. Furthermore, for many real-data sets there does not even exist a global parameter setting for which the result of the clustering algorithm describes the intrinsic clustering structure accurately. We introduce a new algorithm for the purpose of cluster analysis which does not produce a clustering of a data set explicitly; but instead creates an augmented ordering of the database representing its density-based clustering structure. This cluster-ordering contains information which is equivalent to the density-based clusterings corresponding to a broad range of parameter settings. It is a versatile basis for both automatic and interactive cluster analysis. We show how to automatically and efficiently extract not only 'traditional' clustering information (e.g. representative points, arbitrary shaped clusters), but also the intrinsic clustering structure. For medium sized data sets, the cluster-ordering can be represented graphically and for very large data sets, we introduce an appropriate visualization technique. Both are suitable for interactive exploration of the intrinsic clustering structure offering additional insights into the distribution and correlation of the data.

Keywords

This publication has 10 references indexed in Scilit:

Automatic subspace clustering of high dimensional data for data mining applications
Published by Association for Computing Machinery (ACM) ,1998
CURE
Published by Association for Computing Machinery (ACM) ,1998
Finding aggregate proximity relationships and commonalities in spatial data mining
IEEE Transactions on Knowledge and Data Engineering, 1996
Pixel-oriented database visualizations
ACM SIGMOD Record, 1996
databases and visualization
Published by Association for Computing Machinery (ACM) ,1996
BIRCH
Published by Association for Computing Machinery (ACM) ,1996
Grid-clustering: an efficient hierarchical clustering method for very large data sets
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1996
Effective algorithms for the nearest neighbor method in the clustering problem
Pattern Recognition, 1993
The R*-tree: an efficient and robust access method for points and rectangles
Published by Association for Computing Machinery (ACM) ,1990
Finding Groups in Data
Wiley Series in Probability and Statistics, 1990

Cited by 1449 articles