A survey of Web clustering engines

30 July 2009

journal article
Published by Association for Computing Machinery (ACM) in ACM Computing Surveys

Vol. 41 (3), 1-38
https://doi.org/10.1145/1541880.1541884

Abstract

Web clustering engines organize search results by topic, thus offering a complementary view to the flat-ranked list returned by conventional search engines. In this survey, we discuss the issues that must be addressed in the development of a Web clustering engine, including acquisition and preprocessing of search results, their clustering and visualization. Search results clustering, the core of the system, has specific requirements that cannot be addressed by classical clustering algorithms. We emphasize the role played by the quality of the cluster labels as opposed to optimizing only the clustering structure. We highlight the main characteristics of a number of existing Web clustering engines and also discuss how to evaluate their retrieval performance. Some directions for future research are finally presented.

Keywords

This publication has 73 references indexed in Scilit:

Clustering versus faceted categories for information exploration
Communications of the ACM, 2006
Cluster Generation and Labeling for Web Snippets: A Fast, Accurate Hierarchical Solution
Internet Mathematics, 2006
A Concept-Driven Algorithm for Clustering Search Results
IEEE Intelligent Systems, 2005
Finding the flow in web site search
Communications of the ACM, 2002
Building efficient and effective metasearch engines
ACM Computing Surveys, 2002
Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies
The VLDB Journal, 1998
On combining classifiers
IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998
On-line construction of suffix trees
Algorithmica, 1995
Graphical fisheye views
Communications of the ACM, 1994
Indexing by latent semantic analysis
Journal of the American Society for Information Science, 1990

Cited by 233 articles