Classification of a real live heart failure clinical dataset- Is TAN Bayes better than other Bayes?

Abstract

Real live clinical data often present itself with a number of usual challenges, such as class imbalance, high dimensionality and missing data. There is the added complexity of the data being distributed non-uniformly and skewed. Thus the performance of classical classification methods with this type of data is lower than with other types of data. Classification based on Bayes is often suggested as a better method, however, the typical assumption made for Bayes, such as variable and data distributions are not satisfied by real clinical data. This paper focuses on improving the performance of Bayesian classifiers but also on how the underlying structures of the data affects the performance. Thus this paper will focus on Bayesian methodologies, namely use of non-parametric Kernel Density Estimation (KDE) and Tree Augmented Naïve Bayes (TAN). The aim is to measure the performance on the heart failure dataset and by focusing on how the data structure improves the classification. The missing data present in the clinical heart failure datasets are replaced using two imputation methods and results compared. We also apply the imputed datasets on three classifiers including J48 (decision tree), naïve Bayesian multinomial and Bayesian network. The experiments show an improvement on the naïve Bayes using KDE, however TAN achieves significant improvement with the different missing value imputation methods. It is seen that TAN not only improves performance of the classifier, but also enhances prediction accuracy while maintaining efficiency and model simplicity.

Keywords

This publication has 28 references indexed in Scilit:

A New Approach for Bayesian Classifier Learning Structure via K2 Algorithm
Communications in Computer and Information Science, 2012
On the classification performance of TAN and general Bayesian networks
Knowledge-Based Systems, 2009
Bayesian classifiers based on kernel density estimation: Flexible classifiers
International Journal of Approximate Reasoning, 2009
Supervised classification with conditional Gaussian networks: Increasing the structure complexity from naive Bayes
International Journal of Approximate Reasoning, 2006
Applying Neighborhood Consistency for Fast Clustering and Kernel Density Estimation
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
Uric acid renal excretion and renal insufficiency in decompensated severe heart failure
European Journal of Heart Failure, 2005
Learning Tree Augmented Naive Bayes for Ranking
Lecture Notes in Computer Science, 2005
Prognostic Significance of Uric Acid Serum Concentration in Patients With Acute Ischemic Stroke
Stroke, 2002
Non-Parametric Estimation of a Multivariate Probability Density
Theory of Probability and Its Applications, 1969
Approximating discrete probability distributions with dependence trees
IEEE Transactions on Information Theory, 1968

Cited by 2 articles