Evaluation of information retrieval systems: A decision theory approach
- 1 January 1978
- journal article
- Published by Wiley in Journal of the American Society for Information Science
- Vol. 29 (1), 31-40
- https://doi.org/10.1002/asi.4630290106
Abstract
The Swets model of information retrieval, based on a decision theory approach, is discussed, with the overall performance measure being the crucial element reexamined in this paper. The Neyman‐Pearson criterion from statistical decision theory, and based on likelihood ratios, is used to determine an optimal range of Z, the variable assigned to each document by the retrieval system in an attempt to discriminate between relevant and nonrelevant documents. This criterion is shown to be directly related to both precision and recall, and is equivalent to the maximization of the expected value of the retrieval decision for a specific query and a given document under certain conditions. Thus, a compromise can be reached between those who advocate precision as a measure, due partially to its ability to be easily measurable empirically, and those who advocate consideration of recall. Several cases of the normal and Poisson distributions for the variable Z are discussed in terms of their implications for the Neyman‐Pearson decision rule. It is seen that when the variances are unequal, the Swets rule of retrieving a document if its Z value is large enough is not optimal. Finally, the situation of precision and recall not being inversely related is shown to be possible under certain conditions. Thus, this paper attempts to extend the understanding of the theoretical foundations of the decision theory approach to information retrieval.Keywords
This publication has 26 references indexed in Scilit:
- Design equations for retrieval systems based on the swets modelJournal of the American Society for Information Science, 1974
- THE ANOMALOUS BEHAVIOUR OF PRECISION IN THE SWETS MODEL, AND ITS RESOLUTIONJournal of Documentation, 1974
- A decision theory view of the information retrieval situation: An operations research approachJournal of the American Society for Information Science, 1973
- Distance between sets as an objective measure of retrieval effectivenessInformation Storage and Retrieval, 1973
- The use of hierarchic clustering in information retrievalInformation Storage and Retrieval, 1971
- Mathematical and statistical methods of noise evaluation in a retrieval systemInformation Storage and Retrieval, 1971
- The cost-effectiveness analysis of information retrieval and dissemination systemsJournal of the American Society for Information Science, 1971
- Distribution of indexing terms for maximum efficiency of information transmissionAmerican Documentation, 1967
- A searching procedure for information retrievalInformation Storage and Retrieval, 1964
- Inefficiency of the use of Boolean functions for information retrieval systemsCommunications of the ACM, 1961