Constrained data clustering by depth control and progressive constraint relaxation

11 January 2007

journal article
Published by Springer Science and Business Media LLC in The VLDB Journal

Vol. 16 (2), 201-217
https://doi.org/10.1007/s00778-005-0164-6

Abstract

In order to import the domain knowledge or application-dependent parameters into the data mining systems, constraint-based mining has attracted a lot of research attention recently. In this paper, the attributes employed to model the constraints are called constraint attributes and those attributes involved in the objective function to be optimized are called optimization attributes. The constrained clustering considered in this paper is conducted in such a way that the objective function of optimization attributes is optimized subject to the condition that the imposed constraint is satisfied. Explicitly, we address the problem of constrained clustering with numerical constraints, in which the constraint attribute values of any two data items in the same cluster are required to be within the corresponding constraint range. This numerical constrained clustering problem, however, cannot be dealt with by any conventional clustering algorithms. Consequently, we devise several effective and efficient algorithms to solve such a clustering problem. It is noted that due to the intrinsic nature of the numerical constrained clustering, there is an order dependency on the process of attaining the clustering, which in many cases degrades the clustering results. In view of this, we devise a progressive constraint relaxation technique to remedy this drawback and improve the overall performance of clustering results. Explicitly, by using a smaller (tighter) constraint range in earlier iterations of merge, we will have more room to relax the constraint and seek for better solutions in subsequent iterations. It is empirically shown that the progressive constraint relaxation technique is able to improve not only the execution efficiency but also the clustering quality.

Keywords

This publication has 19 references indexed in Scilit:

A robust and efficient clustering algorithm based on cohesion self-merging
Published by Association for Computing Machinery (ACM) ,2002
Constraint-Based Clustering in Large Databases
Lecture Notes in Computer Science, 2001
Fast algorithms for projected clustering
Published by Association for Computing Machinery (ACM) ,1999
Discovering Internet marketing intelligence through online analytical web usage mining
ACM SIGMOD Record, 1998
Advances in Knowledge Discovery and Data Mining
Technometrics, 1998
Data mining: an overview from a database perspective
IEEE Transactions on Knowledge and Data Engineering, 1996
BIRCH
Published by Association for Computing Machinery (ACM) ,1996
How many clusters are best? - An experiment
Pattern Recognition, 1987
Pattern Recognition with Fuzzy Objective Function Algorithms
Published by Springer Science and Business Media LLC ,1981
A Sentence-to-Sentence Clustering Procedure for Pattern Analysis
IEEE Transactions on Systems, Man, and Cybernetics, 1978

Cited by 5 articles