Scoring missing terms in information retrieval tasks

13 November 2004

conference paper
conference paper
Published by Association for Computing Machinery (ACM)

p. 50-58
https://doi.org/10.1145/1031171.1031182

Abstract

An usual approach to address mismatching vocabulary problem is to augment the original query using dictionaries and other lexical resources and/or by looking at pseudo-relevant documents. Either way, terms are added to form a new query that will be used to score all documents in a subsequent retrieval pass, and as consequence the original query's focus may drift because of the newly added terms. We propose a new method to address the mismatching vocabulary problem, expanding original query terms only when necessary and complementing the user query for missing terms while scoring documents. It allows related semantic aspects to be included in a conservative and selective way, thus reducing the possibility of query drift. Our results using replacements for the missing query terms in modified document and passages retrieval methods show significant improvement over the original ones.

Keywords

This publication has 19 references indexed in Scilit:

The effect of document retrieval quality on factoid question answering performance
Published by Association for Computing Machinery (ACM) ,2004
Fast computation of lexical affinity models
Published by Association for Computational Linguistics (ACL) ,2004
Probabilistic structured query methods
Published by Association for Computing Machinery (ACM) ,2003
Frequency estimates for statistical word similarity measures
Published by Association for Computational Linguistics (ACL) ,2003
Resolving query translation ambiguity using a decaying co-occurrence model and syntactic dependence relations
Published by Association for Computing Machinery (ACM) ,2002
Improving the retrieval effectiveness of very short queries
Information Processing & Management, 2002
A probabilistic model of information retrieval: development and comparative experiments
Information Processing & Management, 2000
Improving the effectiveness of information retrieval with local context analysis
ACM Transactions on Information Systems, 2000
Corpus-based stemming using cooccurrence of word variants
ACM Transactions on Information Systems, 1998
A vector space model for automatic indexing
Communications of the ACM, 1975

Cited by 20 articles