Automatic classification of Web queries using very large unlabeled query logs
- 1 April 2007
- journal article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Information Systems
- Vol. 25 (2), 9
- https://doi.org/10.1145/1229179.1229183
Abstract
Accurate topical classification of user queries allows for increased effectiveness and efficiency in general-purpose Web search systems. Such classification becomes critical if the system must route queries to a subset of topic-specific and resource-constrained back-end databases. Successful query classification poses a challenging problem, as Web queries are short, thus providing few features. This feature sparseness, coupled with the constantly changing distribution and vocabulary of queries, hinders traditional text classification. We attack this problem by combining multiple classifiers, including exact lookup and partial matching in databases of manually classified frequent queries, linear models trained by supervised learning, and a novel approach based on mining selectional preferences from a large unlabeled query log. Our approach classifies queries without using external sources of information, such as online Web directories or the contents of retrieved pages, making it viable for use in demanding operational environments, such as large-scale Web search services. We evaluate our approach using a large sample of queries from an operational Web search engine and show that our combined method increases recall by nearly 40% over the best single method while maintaining adequate precision. Additionally, we compare our results to those from the 2005 KDD Cup and find that we perform competitively despite our operational restrictions. This suggests it is possible to topically classify a significant portion of the query stream without requiring external sources of information, allowing for deployment in operationally restricted environments.Keywords
This publication has 20 references indexed in Scilit:
- Classifying search engine queries using the web as background knowledgeACM SIGKDD Explorations Newsletter, 2005
- The Ferrety algorithm for the KDD Cup 2005 problemACM SIGKDD Explorations Newsletter, 2005
- Q 2 C@USTACM SIGKDD Explorations Newsletter, 2005
- Disambiguating Nouns, Verbs, and Adjectives Using Automatically Acquired Selectional PreferencesComputational Linguistics, 2003
- Coverage, relevance, and rankingACM Transactions on Information Systems, 2003
- U.S. versus European web searching trendsACM SIGIR Forum, 2002
- From e-sex to e-commerce: Web search changesComputer, 2002
- Statistical models for the induction and use of selectional preferencesCognitive Science, 2002
- Query clustering using user logsACM Transactions on Information Systems, 2002
- Learning algorithms with optimal stability in neural networksJournal of Physics A: General Physics, 1987