A survey of Web clustering engines
- 30 July 2009
- journal article
- Published by Association for Computing Machinery (ACM) in ACM Computing Surveys
- Vol. 41 (3), 1-38
- https://doi.org/10.1145/1541880.1541884
Abstract
Web clustering engines organize search results by topic, thus offering a complementary view to the flat-ranked list returned by conventional search engines. In this survey, we discuss the issues that must be addressed in the development of a Web clustering engine, including acquisition and preprocessing of search results, their clustering and visualization. Search results clustering, the core of the system, has specific requirements that cannot be addressed by classical clustering algorithms. We emphasize the role played by the quality of the cluster labels as opposed to optimizing only the clustering structure. We highlight the main characteristics of a number of existing Web clustering engines and also discuss how to evaluate their retrieval performance. Some directions for future research are finally presented.Keywords
This publication has 73 references indexed in Scilit:
- Clustering versus faceted categories for information explorationCommunications of the ACM, 2006
- Cluster Generation and Labeling for Web Snippets: A Fast, Accurate Hierarchical SolutionInternet Mathematics, 2006
- A Concept-Driven Algorithm for Clustering Search ResultsIEEE Intelligent Systems, 2005
- Finding the flow in web site searchCommunications of the ACM, 2002
- Building efficient and effective metasearch enginesACM Computing Surveys, 2002
- Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomiesThe VLDB Journal, 1998
- On combining classifiersIEEE Transactions on Pattern Analysis and Machine Intelligence, 1998
- On-line construction of suffix treesAlgorithmica, 1995
- Graphical fisheye viewsCommunications of the ACM, 1994
- Indexing by latent semantic analysisJournal of the American Society for Information Science, 1990