Concept-Based Information Retrieval Using Explicit Semantic Analysis
- 1 April 2011
- journal article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Information Systems
- Vol. 29 (2), 1-34
- https://doi.org/10.1145/1961209.1961211
Abstract
Information retrieval systems traditionally rely on textual keywords to index and retrieve documents. Keyword-based retrieval may return inaccurate and incomplete results when different keywords are used to describe the same concept in the documents and in the queries. Furthermore, the relationship between these related keywords may be semantic rather than syntactic, and capturing it thus requires access to comprehensive human world knowledge. Concept-based retrieval methods have attempted to tackle these difficulties by using manually built thesauri, by relying on term cooccurrence data, or by extracting latent word relationships and concepts from a corpus. In this article we introduce a new concept-based retrieval approach based on Explicit Semantic Analysis (ESA), a recently proposed method that augments keyword-based text representation with concept-based features, automatically extracted from massive human knowledge repositories such as Wikipedia. Our approach generates new text features automatically, and we have found that high-quality feature selection becomes crucial in this setting to make the retrieval more focused. However, due to the lack of labeled data, traditional feature selection methods cannot be used, hence we propose new methods that use self-generated labeled training data. The resulting system is evaluated on several TREC datasets, showing superior performance over previous state-of-the-art results.Keywords
This publication has 36 references indexed in Scilit:
- A Comparative Study of Utilizing Topic Models for Information RetrievalLecture Notes in Computer Science, 2009
- Conceptual query expansionData & Knowledge Engineering, 2006
- Using Concept-Based Indexing to Improve Language Modeling Approach to Genomic IRLecture Notes in Computer Science, 2006
- Knowledge-based query expansion to support scenario-specific retrieval of medical free textPublished by Association for Computing Machinery (ACM) ,2005
- Negative pseudo-relevance feedback in content-based video retrievalPublished by Association for Computing Machinery (ACM) ,2003
- A survey on the use of relevance feedback for information access systemsThe Knowledge Engineering Review, 2003
- Passage retrieval based on language modelsPublished by Association for Computing Machinery (ACM) ,2002
- Improving the effectiveness of information retrieval with local context analysisACM Transactions on Information Systems, 2000
- Introduction to WordNet: An On-line Lexical Database*International Journal of Lexicography, 1990
- Indexing by latent semantic analysisJournal of the American Society for Information Science, 1990