Validating Diagnostic Tests, Correct and Incorrect Methods, New Developments

Abstract
Background: Clinical developments of new treatments are impossible without adequate diagnostic tests. Several working parties including the Consolidated Standard Randomized Trials (CONSORT) movement and the Standard for Reporting Diagnostic Accuracy (STARD) group have launched quality criteria for diagnostic tests. Particularly, accuracy-, reproducibility- and precision-assessments have been recommended, but methods of assessment have not been defined so far. Objective: To summarize correct and incorrect methods and new developments for that purpose. Results and conclusions: A diagnostic test can be either qualitative like the presence of an elevated erythrocyte sedimentation rate to demonstrate pneumonia, or quantitative like ultrasound flow velocity to estimate invasive electromagnetic flow velocity. Qualitative diagnostic tests can be assessed for -accuracy using sensitivity / specificity / overall accuracy, and receiver operated (ROC) curves, -reproducibility using Cohens kappas, -precision using confidence intervals of sensitivity / specificity / overall accuracy. Quantitative diagnostics tests can be assessed for -accuracy using a linear regression line (y = a + b x) and testing a = 0.00 / b = 1.00, -reproducibility using duplicate standard errors, repeatability coefficients or intraclass correlations, -precision by calculating confidence intervals. Improved confidence intervals can be obtained by data modeling. A significant linear correlation between the diagnostic test and the gold standard test does not correctly indicate adequate accuracy. A small mean difference between repeated measures or a significant linear relationship between repeated measures does not indicate adequate reproducibility. New developments include continuous ROC curves, intraclass correlations, and Bland-Altman agreement tests for the accuracy assessments of quantitative diagnostic tests.