Applying Data Mining Techniques in the Development of a Diagnostics Questionnaire for GERD

Abstract
Gastroesophageal reflux disease (GERD) is a common condition, managed mostly in primary care practice. Heartburn and acid regurgitation are considered primary symptoms, and are usually highly specific. However, the symptom spectrum is much wider and in many cases it is difficult to determine whether the patient has GERD or dyspepsia from another origin. The aim of this study is to develop a symptom score and rule for the diagnosis of GERD, using data mining techniques, to provide a clinical diagnostic tool for primary care practitioners in the evaluation and management of upper gastrointestinal symptoms. A diagnostic symptom questionnaire consisting of 15 items and based on the current literature was designed to measure the presence and severity of reflux and dyspepsia symptoms using a 5-point Likert-type scale. A total of 132 subjects with uninvestigated upper abdominal symptoms were prospectively recruited for symptom evaluation. All patients were interviewed and examined, underwent upper gastrointestinal endoscopy, and completed the questionnaire. Based on endoscopic findings as well as the medical interview, the subjects were classified as having reflux disease (GERD) or non-reflux disease (non-GERD). Data mining models and algorithms (neural networks, decision trees, and logistic regression) were used to build a short and simple new discriminative questionnaire. The most relevant variables discriminating GERD from non-GERD patients were heartburn, regurgitation, clinical response to antacids, sour taste, and aggravation of symptoms after a heavy meal. The sensitivity and specificity of the new symptom score were 70%–75% and 63%–78%, respectively. The area under the ROC curve for logistic regression and neural networks were 0.783 and 0.787, respectively. We present a new validated discriminative GERD questionnaire using data mining techniques. The questionnaire is useful, friendly, and short, and therefore can be easily applied in clinical practice for choosing the appropriate diagnostic workup for patients with upper gastrointestinal complaints.