Comparing Data Mining Techniques in HIV Testing Prediction
Open Access
- 1 January 2015
- journal article
- research article
- Published by Scientific Research Publishing, Inc. in Intelligent Information Management
- Vol. 07 (03), 153-180
- https://doi.org/10.4236/iim.2015.73014
Abstract
Introduction: The present work compared the prediction power of the different data mining techniques used to develop the HIV testing prediction model. Four popular data mining algorithms (Decision tree, Naive Bayes, Neural network, logistic regression) were used to build the model that predicts whether an individual was being tested for HIV among adults in Ethiopia using EDHS 2011. The final experimentation results indicated that the decision tree (random tree algorithm) performed the best with accuracy of 96%, the decision tree induction method (J48) came out to be the second best with a classification accuracy of 79%, followed by neural network (78%). Logistic regression has also achieved the least classification accuracy of 74%. Objectives: The objective of this study is to compare the prediction power of the different data mining techniques used to develop the HIV testing prediction model. Methods: Cross-Industry Standard Process for Data Mining (CRISP-DM) was used to predict the model for HIV testing and explore association rules between HIV testing and the selected attributes. Data preprocessing was performed and missing values for the categorical variable were replaced by the modal value of the variable. Different data mining techniques were used to build the predictive model. Results: The target dataset contained 30,625 study participants. Out of which 16,515 (54%) participants were women while the rest 14,110 (46%) were men. The age of the participants in the dataset ranged from 15 to 59 years old with modal age of 15 - 19 years old. Among the study participants, 17,719 (58%) have never been tested for HIV while the rest 12,906 (42%) had been tested. Residence, educational level, wealth index, HIV related stigma, knowledge related to HIV, region, age group, risky sexual behaviour attributes, knowledge about where to test for HIV and knowledge on family planning through mass media were found to be predictors for HIV testing. Conclusion and Recommendation: The results obtained from this research reveal that data mining is crucial in extracting relevant information for the effective utilization of HIV testing services which has clinical, community and public health importance at all levels. It is vital to apply different data mining techniques for the same settings and compare the model performances (based on accuracy, sensitivity, and specificity) with each other. Furthermore, this study would also invite interested researchers to explore more on the application of data mining techniques in healthcare industry or else in related and similar settings for the future.Keywords
This publication has 8 references indexed in Scilit:
- Linking family planning with HIV/AIDS interventions: a systematic review of the evidenceAIDS, 2009
- Rapid Scale-Up of Antiretroviral Treatment in Ethiopia: Successes and System-Wide EffectsPLoS Medicine, 2009
- Early mortality among adults accessing a community-based antiretroviral service in South Africa: implications for programme designAIDS, 2005
- Assessment of HIV/AIDS-related health performance using an artificial neural networkInformation & Management, 2001
- Efficacy of voluntary HIV-1 counselling and testing in individuals and couples in Kenya, Tanzania, and Trinidad: a randomised trialThe Lancet, 2000
- Promoting early HIV diagnosis and entry into careAIDS, 1999
- Antiretroviral Therapy for HIV Infection in 1998JAMA, 1998
- A neural network application to classification of health status of HIV/AIDS patients.Journal of Medical Systems, 1997