Clustering data streams: theory and practice

Top Cited Papers

Abstract

The data stream model has recently attracted attention for its applicability to numerous types of data, including telephone records, Web documents, and clickstreams. For analysis of such data, the ability to process the data in a single pass, or a small number of passes, while using little memory, is crucial. We describe such a streaming algorithm that effectively clusters large data streams. We also provide empirical evidence of the algorithm's performance on synthetic and real data streams.

Keywords

This publication has 53 references indexed in Scilit:

CLARANS: a method for clustering objects for spatial data mining
IEEE Transactions on Knowledge and Data Engineering, 2002
Cure: an efficient clustering algorithm for large databases
Information Systems, 2001
DEMON: mining and monitoring evolving data
IEEE Transactions on Knowledge and Data Engineering, 2001
Scalability for clustering algorithms revisited
ACM SIGKDD Explorations Newsletter, 2000
An approach to active spatial data mining based on statistical information
IEEE Transactions on Knowledge and Data Engineering, 2000
Greedy Strikes Back: Improved Facility Location Algorithms
Journal of Algorithms, 1999
Randomized Query Processing in Robot Path Planning
Journal of Computer and System Sciences, 1998
Approximation algorithms for geometric median problems
Information Processing Letters, 1992
Probabilistic counting algorithms for data base applications
Journal of Computer and System Sciences, 1985
Selection and sorting with limited storage
Theoretical Computer Science, 1980

Cited by 520 articles