Abstract
Naïve Bayes is one of the most efficient and effective inductive learning algorithms for machine learning and data mining. Its competitive performance in classification is surprising, because the conditional independence assumption on which it is based is rarely true in real-world applications. An open question is: what is the true reason for the surprisingly good performance of Naïve Bayes in classification? In this paper, we propose a novel explanation for the good classification performance of Naïve Bayes. We show that, essentially, dependence distribution plays a crucial role. Here dependence distribution means how the local dependence of an attribute distributes in each class, evenly or unevenly, and how the local dependences of all attributes work together, consistently (supporting a certain classification) or inconsistently (canceling each other out). Specifically, we show that no matter how strong the dependences among attributes are, Naïve Bayes can still be optimal if the dependences distribute evenly in classes, or if the dependences cancel each other out. We propose and prove a sufficient and necessary condition for the optimality of Naïve Bayes. Further, we investigate the optimality of Naïve Bayes under the Gaussian distribution. We present and prove a sufficient condition for the optimality of Naïve Bayes, in which the dependences among attributes exist. This provides evidence that dependences may cancel each other out. Our theoretic analysis can be used in designing learning algorithms. In fact, a major class of learning algorithms for Bayesian networks are conditional independence-based (or CI-based), which are essentially based on dependence. We design a dependence distribution-based algorithm by extending the ChowLiu algorithm, a widely used CI based algorithm. Our experiments show that the new algorithm outperforms the ChowLiu algorithm, which also provides empirical evidence to support our new explanation.