Comparing data mining methods on the VAERS database

Abstract
Purpose Data mining may enhance traditional surveillance of vaccine adverse events by identifying events that are reported more commonly after administering one vaccine than other vaccines. Data mining methods find signals as the proportion of times a condition or group of conditions is reported soon after the administration of a vaccine; thus it is a relative proportion compared across vaccines, and not an absolute rate for the condition. The Vaccine Adverse Event Reporting System (VAERS) contains approximately 150 000 reports of adverse events that are possibly associated with vaccine administration. Methods We studied four data mining techniques: empirical Bayes geometric mean (EBGM), lower‐bound of the EBGM's 90% confidence interval (EB05), proportional reporting ratio (PRR), and screened PRR (SPRR). We applied these to the VAERS database and compared the agreement among methods and other performance properties, particularly focusing on the vaccine–event combinations with the highest numerical scores in the various methods. Results The vaccine–event combinations with the highest numerical scores varied substantially among the methods. Not all combinations representing known associations appeared in the top 100 vaccine–event pairs for all methods. Conclusions The four methods differ in their ranking of vaccine–COSTART pairs. A given method may be superior in certain situations but inferior in others. This paper examines the statistical relationships among the four estimators. Determining which method is best for public health will require additional analysis that focuses on the true alarm and false alarm rates using known vaccine–event associations. Evaluating the properties of these data mining methods will help determine the value of such methods in vaccine safety surveillance. Copyright © 2005 John Wiley & Sons, Ltd.