Scoring missing terms in information retrieval tasks
- 13 November 2004
- conference paper
- conference paper
- Published by Association for Computing Machinery (ACM)
Abstract
An usual approach to address mismatching vocabulary problem is to augment the original query using dictionaries and other lexical resources and/or by looking at pseudo-relevant documents. Either way, terms are added to form a new query that will be used to score all documents in a subsequent retrieval pass, and as consequence the original query's focus may drift because of the newly added terms. We propose a new method to address the mismatching vocabulary problem, expanding original query terms only when necessary and complementing the user query for missing terms while scoring documents. It allows related semantic aspects to be included in a conservative and selective way, thus reducing the possibility of query drift. Our results using replacements for the missing query terms in modified document and passages retrieval methods show significant improvement over the original ones.Keywords
This publication has 19 references indexed in Scilit:
- The effect of document retrieval quality on factoid question answering performancePublished by Association for Computing Machinery (ACM) ,2004
- Fast computation of lexical affinity modelsPublished by Association for Computational Linguistics (ACL) ,2004
- Probabilistic structured query methodsPublished by Association for Computing Machinery (ACM) ,2003
- Frequency estimates for statistical word similarity measuresPublished by Association for Computational Linguistics (ACL) ,2003
- Resolving query translation ambiguity using a decaying co-occurrence model and syntactic dependence relationsPublished by Association for Computing Machinery (ACM) ,2002
- Improving the retrieval effectiveness of very short queriesInformation Processing & Management, 2002
- A probabilistic model of information retrieval: development and comparative experimentsInformation Processing & Management, 2000
- Improving the effectiveness of information retrieval with local context analysisACM Transactions on Information Systems, 2000
- Corpus-based stemming using cooccurrence of word variantsACM Transactions on Information Systems, 1998
- A vector space model for automatic indexingCommunications of the ACM, 1975