Validation of Statistical Methods to Compare Cancellation Rates on the Day of Surgery

Abstract
We investigated the validity of several statistical methods to monitor the cancellation of electively scheduled cases on the day of surgery: χ2 test, Fisher’s exact test, Rao and Scott test, Student’s t-test, Clopper-Pearson confidence intervals, and Chen and Tipping modification of the Clopper-Pearson confidence intervals. Discrete-event computer simulation over many years was used to represent surgical suites with an unchanging cancellation rate. Because the true cancellation rate was fixed, the accuracy of the statistical methods could be determined. Cancellations caused by medical events, rare events, cases lasting longer than scheduled, and full postanesthesia or intensive care unit beds were modeled. We found that applying Student’s two-sample t-test to the transformation of the numbers of cases and canceled cases from each of six 4-wk periods was valid for most conditions. We recommend that clinicians and managers use this method in their quality monitoring reports. The other methods gave inaccurate results. For example, using χ2 or Fisher’s exact test, hospitals may erroneously determine that cancellation rates have increased when they really are unchanged. Conversely, if inappropriate statistical methods are used, administrators may claim success at reducing cancellation rates when, in fact, the problem remains unresolved, affecting patients and clinicians.