Use of the receiver operating characteristic curve to evaluate sensitivity, specificity, and accuracy of methods for detection of peaks in hormone time series

1 March 1991

journal article
Published by Bioscientifica in Acta Endocrinologica

Vol. 124 (3), 295-306
https://doi.org/10.1530/acta.0.1240295

Abstract

We utilize the "Receiver Operating Characteristic" to describe the relationship between sensitivity and specificity as the threshold for peak detection is varied systematically, to provide objective comparison of the performance of methods for detection of episodic hormonal secretion. A computer program was used to generate synthetic data with peaks with variable durations, with constant or variable height, shape and/or interpulse interval. This approach was used to compare the CLUSTER and DETECT programs. For both programs, the observed false positive rates estimated using signal-free data were in good agreement with the nominal rates, but in the presence of signal the observed false positive rates were systematically lower. Sensitivity increases with increasing signal/noise ratio, as expected. Program DETECT, using its standard options, provided excellent sensitivity (90-100%) with very low false positive rate under all conditions tested. Its performance could be further improved by the use of a more stringent definition of a peak requiring the presence of "UP" followed by a "DOWN". The CLUSTER program was found to have very poor sensitivity when using the "local variance" option. Use of the true fixed standard deviation or percent coefficient of variation resulted in a modest improvement. Optimal performance of program CLUSTER was obtained by the use of the best of 3 variance models, testing 12 different cluster sizes (from 1×1) to 4×4 and selecting the best among these: under these conditions it can achieve high sensitivity (90-100%) for very low observed false positive rate, such that its performance was comparable to that of DETECT. The methods developed and illustrated here should permit the definitive characterization and validation of the performance of any one method, the objective comparison of the relative performance of two or more methods for analysis of pulsatile hormone levels for episodic hormone secretion, and lead to the improvement of algorithms for peak detection.

Keywords

Cited by 29 articles