The application of naive Bayes model averaging to predict Alzheimer's disease from genome-wide data

Open Access

Abstract

Objective Predicting patient outcomes from genome-wide measurements holds significant promise for improving clinical care. The large number of measurements (eg, single nucleotide polymorphisms (SNPs)), however, makes this task computationally challenging. This paper evaluates the performance of an algorithm that predicts patient outcomes from genome-wide data by efficiently model averaging over an exponential number of naive Bayes (NB) models. Design This model-averaged naive Bayes (MANB) method was applied to predict late onset Alzheimer's disease in 1411 individuals who each had 312 318 SNP measurements available as genome-wide predictive features. Its performance was compared to that of a naive Bayes algorithm without feature selection (NB) and with feature selection (FSNB). Measurement Performance of each algorithm was measured in terms of area under the ROC curve (AUC), calibration, and run time. Results The training time of MANB (16.1 s) was fast like NB (15.6 s), while FSNB (1684.2 s) was considerably slower. Each of the three algorithms required less than 0.1 s to predict the outcome of a test case. MANB had an AUC of 0.72, which is significantly better than the AUC of 0.59 by NB (pConclusion MANB performed comparatively well in predicting a clinical outcome from a high-dimensional genome-wide dataset. These results provide support for including MANB in the methods used to predict outcomes from large, genome-wide datasets.

Keywords

This publication has 24 references indexed in Scilit:

Approaches for Evaluating Rare Polymorphisms in Genetic Association Studies
Human Heredity, 2010
Fine mapping of the chromosome 10q11-q21 linkage region in Alzheimer's disease cases and controls
neurogenetics, 2010
Common vs. rare allele hypotheses for complex diseases
Current Opinion in Genetics & Development, 2009
Genetics of Alzheimer's disease: recent advances
Genome Medicine, 2009
On Jim Watson's APOE status: genetic information is hard to hide
European Journal of Human Genetics, 2008
Shifting Paradigm of Association Studies: Value of Rare Single-Nucleotide Polymorphisms
American Journal of Human Genetics, 2008
GAB2 Alleles Modify Alzheimer's Risk in APOE ɛ4 Carriers
Neuron, 2007
Systematic meta-analyses of Alzheimer disease genetic association studies: the AlzGene database
Nature Genetics, 2007
A Century of Alzheimer's Disease
Science, 2006
Learning Bayesian networks: The combination of knowledge and statistical data
Machine Learning, 1995

Cited by 82 articles