Evaluation of information retrieval systems: A decision theory approach

1 January 1978

journal article
Published by Wiley in Journal of the American Society for Information Science

Vol. 29 (1), 31-40
https://doi.org/10.1002/asi.4630290106

Abstract

The Swets model of information retrieval, based on a decision theory approach, is discussed, with the overall performance measure being the crucial element reexamined in this paper. The Neyman‐Pearson criterion from statistical decision theory, and based on likelihood ratios, is used to determine an optimal range of Z, the variable assigned to each document by the retrieval system in an attempt to discriminate between relevant and nonrelevant documents. This criterion is shown to be directly related to both precision and recall, and is equivalent to the maximization of the expected value of the retrieval decision for a specific query and a given document under certain conditions. Thus, a compromise can be reached between those who advocate precision as a measure, due partially to its ability to be easily measurable empirically, and those who advocate consideration of recall. Several cases of the normal and Poisson distributions for the variable Z are discussed in terms of their implications for the Neyman‐Pearson decision rule. It is seen that when the variances are unequal, the Swets rule of retrieving a document if its Z value is large enough is not optimal. Finally, the situation of precision and recall not being inversely related is shown to be possible under certain conditions. Thus, this paper attempts to extend the understanding of the theoretical foundations of the decision theory approach to information retrieval.

Keywords

This publication has 26 references indexed in Scilit:

Design equations for retrieval systems based on the swets model
Journal of the American Society for Information Science, 1974
THE ANOMALOUS BEHAVIOUR OF PRECISION IN THE SWETS MODEL, AND ITS RESOLUTION
Journal of Documentation, 1974
A decision theory view of the information retrieval situation: An operations research approach
Journal of the American Society for Information Science, 1973
Distance between sets as an objective measure of retrieval effectiveness
Information Storage and Retrieval, 1973
The use of hierarchic clustering in information retrieval
Information Storage and Retrieval, 1971
Mathematical and statistical methods of noise evaluation in a retrieval system
Information Storage and Retrieval, 1971
The cost-effectiveness analysis of information retrieval and dissemination systems
Journal of the American Society for Information Science, 1971
Distribution of indexing terms for maximum efficiency of information transmission
American Documentation, 1967
A searching procedure for information retrieval
Information Storage and Retrieval, 1964
Inefficiency of the use of Boolean functions for information retrieval systems
Communications of the ACM, 1961

Cited by 38 articles