Simulation study of a panel of reliability indicators applied to paired measurements

Abstract
Reliability is a subject of continuing discussion in biomedial specialty areas, including physical anthropology and nutritional epidemiology. The purpose of this study was to explore techniques of detecting differences between two evaluators or methods. A field study in which anthropometric dimensions would be taken by two independent evaluators on each participant in a study group was simulated. A panel of reliability indicators was applied across a broad range of parameters using simulation, and then the panel was applied to field anthropometric data. The panel consisted of the intraclass correlation coefficient (ICC), paired t‐test, a simultaneous test of evaluator means and variances, technical error of measurement, mean absolute difference, and mean difference. The simultaneous test for equal evaluator means and variances uses regression to model paired differences versus paired sums. The simulation demonstrated general properties of the reliability indicators across many conditions of population variance, measurer bias, and measurer error variance. High values of ICC often exist in cases in which the measurers are different. The simultaneous test is thus a powerful method for detecting measurer differences, especially when combined with the paired t‐test. However, a single reliability indicator that is sufficient to determine all measurer inconsistencies was not identified. The field study and the simulation permitted the development of a logical approach to determining the source and magnitude of measurer differences using the panel of reliability indicators.