Detection of orthogonal concepts in subspaces of high dimensional data
- 2 November 2009
- conference paper
- conference paper
- Published by Association for Computing Machinery (ACM)
- p. 1317-1326
- https://doi.org/10.1145/1645953.1646120
Abstract
In the knowledge discovery process, clustering is an established technique for grouping objects based on mutual similarity. However, in today's applications for each object very many attributes are provided. As multiple concepts described by different attributes are mixed in the same data set, clusters do not appear in all dimensions. In these high dimensional data spaces, each object can be clustered in several projections of the data. However, recent clustering techniques do not succeed in detection of these orthogonal concepts hidden in the data. They either miss multiple concepts for each object by partitioning approaches or provide redundant clusters in very similar subspaces. In this work we propose a novel clustering method aiming only at orthogonal concept detection in subspaces of the data. Unlike existing clustering approaches, OSCLU (Orthogonal Subspace CLUstering) detects for each object the orthogonal concepts described by differing attributes while pruning similar concepts. Thus, each detected cluster in an orthogonal subspace provides novel information about the hidden structure of the data. Thorough experiments on real and synthetic data show that OSCLU yields substantial quality improvements over existing clustering approaches.Keywords
This publication has 13 references indexed in Scilit:
- Evaluating clustering in subspace projections of high dimensional dataProceedings of the VLDB Endowment, 2009
- Clustering high-dimensional dataACM Transactions on Knowledge Discovery From Data, 2009
- INSCY: Indexing Subspace Clusters with In-Process-Removal of RedundancyPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2008
- Finding non-redundant, statistically significant regions in high dimensional dataPublished by Association for Computing Machinery (ACM) ,2008
- DUSC: Dimensionality Unbiased Subspace ClusteringPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2007
- P3C: A Robust Projected Clustering AlgorithmIEEE International Conference on Data Mining (ICDM), 2006
- Subspace clustering for high dimensional dataACM SIGKDD Explorations Newsletter, 2004
- Density-Connected Subspace Clustering for High-Dimensional DataPublished by Society for Industrial & Applied Mathematics (SIAM) ,2004
- Ranking Interesting Subspaces for Clustering High Dimensional DataLecture Notes in Computer Science, 2003
- Density Estimation for Statistics and Data AnalysisPublished by Springer Science and Business Media LLC ,1400