Distributed clustering algorithm for spatial data mining

1 July 2015

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 60-65
https://doi.org/10.1109/icsdm.2015.7298026

Abstract

Distributed data mining techniques and mainly distributed clustering are widely used in the last decade because they deal with very large and heterogeneous datasets which cannot be gathered centrally. Current distributed clustering approaches are normally generating global models by aggregating local results that are obtained on each site. While this approach mines the datasets on their locations the aggregation phase is complex, which may produce incorrect and ambiguous global clusters and therefore incorrect knowledge. In this paper we propose a new clustering approach for very large spatial datasets that are heterogeneous and distributed. The approach is based on K-means Algorithm but it generates the number of global clusters dynamically. Moreover, this approach uses an elaborated aggregation phase. The aggregation phase is designed in such a way that the overall process is efficient in time and memory allocation. Preliminary results show that the proposed approach produces high quality results and scales up well. We also compared it to two popular clustering algorithms and show that this approach is much more efficient.

Keywords

This publication has 21 references indexed in Scilit:

Efficient Distributed Approach for Density-Based Clustering
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2011
Performance study of distributed Apriori-like frequent itemsets mining
Knowledge and Information Systems, 2009
Efficient generation of simple polygons for characterizing the shape of a set of points in the plane
Pattern Recognition, 2008
A New Approach for Distributed Density Based Clustering on Grid Platform
Lecture Notes in Computer Science, 2007
Lightweight Clustering Technique for Distributed Data Mining Applications
Published by Springer Science and Business Media LLC ,2007
Towards a framework for mining and analysing spatio?temporal datasets
International Journal of Geographical Information Science, 2007
Cure: an efficient clustering algorithm for large databases
Information Systems, 2001
BIRCH
ACM SIGMOD Record, 1996
A Comparative Evaluation of Voting and Meta-learning on Partitioned Data
Published by Elsevier BV ,1995
On the shape of a set of points in the plane
IEEE Transactions on Information Theory, 1983

Cited by 28 articles