Cumulated gain-based evaluation of IR techniques

1 October 2002

journal article
Published by Association for Computing Machinery (ACM) in ACM Transactions on Information Systems

Vol. 20 (4), 422-446
https://doi.org/10.1145/582415.582418

Abstract

Modern large retrieval environments tend to overwhelm their users by their large output. Since all documents are not of equal relevance to their users, highly relevant documents should be identified and ranked first for presentation. In order to develop IR techniques in this direction, it is necessary to develop evaluation approaches and methods that credit IR methods for their ability to retrieve highly relevant documents. This can be done by extending traditional evaluation methods, that is, recall and precision based on binary relevance judgments, to graded relevance judgments. Alternatively, novel measures based on graded relevance judgments may be developed. This article proposes several novel measures that compute the cumulative gain the user obtains by examining the retrieval result up to a given ranked position. The first one accumulates the relevance scores of retrieved documents along the ranked result list. The second one is similar but applies a discount factor to the relevance scores in order to devaluate late-retrieved documents. The third one computes the relative-to-the-ideal performance of IR techniques, based on the cumulative gain they are able to yield. These novel measures are defined and discussed and their use is demonstrated in a case study using TREC data: sample system run results for 20 queries in TREC-7. As a relevance base we used novel graded relevance judgments on a four-point scale. The test results indicate that the proposed measures credit IR methods for their ability to retrieve highly relevant documents and allow testing of statistical significance of effectiveness differences. The graphs based on the measures also provide insight into the performance IR techniques and allow interpretation, for example, from the user point of view.

Keywords

This publication has 19 references indexed in Scilit:

Using graded relevance assessments in IR evaluation
Journal of the American Society for Information Science and Technology, 2002
Changes in relevance criteria and problem stages in task performance
Journal of Documentation, 2000
Experimental components for the evaluation of interactive information retrieval systems
Journal of Documentation, 2000
From highly relevant to not relevant: examining different regions of relevance
Information Processing & Management, 1998
Integration of user profiles: models and experiments in information retrieval
Information Processing & Management, 1990
An evaluation of retrieval effectiveness for a full-text document-retrieval system
Communications of the ACM, 1985
RANKING IN PRINCIPLE
Journal of Documentation, 1978
AUTOMATIC INDEXING
Journal of Documentation, 1974
Measures for the comparison of information retrieval systems
American Documentation, 1968
Expected search length: A single measure of retrieval effectiveness based on the weak ordering action of retrieval systems
American Documentation, 1968

Cited by 2726 articles