Classification of a real live heart failure clinical dataset- Is TAN Bayes better than other Bayes?
- 1 October 2014
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE) in 2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC)
- p. 882-887
- https://doi.org/10.1109/smc.2014.6974023
Abstract
Real live clinical data often present itself with a number of usual challenges, such as class imbalance, high dimensionality and missing data. There is the added complexity of the data being distributed non-uniformly and skewed. Thus the performance of classical classification methods with this type of data is lower than with other types of data. Classification based on Bayes is often suggested as a better method, however, the typical assumption made for Bayes, such as variable and data distributions are not satisfied by real clinical data. This paper focuses on improving the performance of Bayesian classifiers but also on how the underlying structures of the data affects the performance. Thus this paper will focus on Bayesian methodologies, namely use of non-parametric Kernel Density Estimation (KDE) and Tree Augmented Naïve Bayes (TAN). The aim is to measure the performance on the heart failure dataset and by focusing on how the data structure improves the classification. The missing data present in the clinical heart failure datasets are replaced using two imputation methods and results compared. We also apply the imputed datasets on three classifiers including J48 (decision tree), naïve Bayesian multinomial and Bayesian network. The experiments show an improvement on the naïve Bayes using KDE, however TAN achieves significant improvement with the different missing value imputation methods. It is seen that TAN not only improves performance of the classifier, but also enhances prediction accuracy while maintaining efficiency and model simplicity.Keywords
This publication has 28 references indexed in Scilit:
- A New Approach for Bayesian Classifier Learning Structure via K2 AlgorithmCommunications in Computer and Information Science, 2012
- On the classification performance of TAN and general Bayesian networksKnowledge-Based Systems, 2009
- Bayesian classifiers based on kernel density estimation: Flexible classifiersInternational Journal of Approximate Reasoning, 2009
- Supervised classification with conditional Gaussian networks: Increasing the structure complexity from naive BayesInternational Journal of Approximate Reasoning, 2006
- Applying Neighborhood Consistency for Fast Clustering and Kernel Density EstimationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Uric acid renal excretion and renal insufficiency in decompensated severe heart failureEuropean Journal of Heart Failure, 2005
- Learning Tree Augmented Naive Bayes for RankingLecture Notes in Computer Science, 2005
- Prognostic Significance of Uric Acid Serum Concentration in Patients With Acute Ischemic StrokeStroke, 2002
- Non-Parametric Estimation of a Multivariate Probability DensityTheory of Probability and Its Applications, 1969
- Approximating discrete probability distributions with dependence treesIEEE Transactions on Information Theory, 1968