EXPLORING CONDITIONS FOR THE OPTIMALITY OF NAÏVE BAYES

Abstract

Naïve Bayes is one of the most efficient and effective inductive learning algorithms for machine learning and data mining. Its competitive performance in classification is surprising, because the conditional independence assumption on which it is based is rarely true in real-world applications. An open question is: what is the true reason for the surprisingly good performance of Naïve Bayes in classification? In this paper, we propose a novel explanation for the good classification performance of Naïve Bayes. We show that, essentially, dependence distribution plays a crucial role. Here dependence distribution means how the local dependence of an attribute distributes in each class, evenly or unevenly, and how the local dependences of all attributes work together, consistently (supporting a certain classification) or inconsistently (canceling each other out). Specifically, we show that no matter how strong the dependences among attributes are, Naïve Bayes can still be optimal if the dependences distribute evenly in classes, or if the dependences cancel each other out. We propose and prove a sufficient and necessary condition for the optimality of Naïve Bayes. Further, we investigate the optimality of Naïve Bayes under the Gaussian distribution. We present and prove a sufficient condition for the optimality of Naïve Bayes, in which the dependences among attributes exist. This provides evidence that dependences may cancel each other out. Our theoretic analysis can be used in designing learning algorithms. In fact, a major class of learning algorithms for Bayesian networks are conditional independence-based (or CI-based), which are essentially based on dependence. We design a dependence distribution-based algorithm by extending the ChowLiu algorithm, a widely used CI based algorithm. Our experiments show that the new algorithm outperforms the ChowLiu algorithm, which also provides empirical evidence to support our new explanation.

Keywords

This publication has 6 references indexed in Scilit:

Technical Note: Naive Bayes for Regression
Machine Learning, 2000
On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality
Data Mining and Knowledge Discovery, 1997
Bayesian Network Classifiers
Machine Learning, 1997
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss
Machine Learning, 1997
Some inconsistencies and misidentified modeling assumptions in probabilistic information retrieval
ACM Transactions on Information Systems, 1995
Approximating discrete probability distributions with dependence trees
IEEE Transactions on Information Theory, 1968

Cited by 164 articles