Disease Prediction by Machine Learning Over Big Data From Healthcare Communities
Top Cited Papers
Open Access
- 26 April 2017
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Access
- Vol. 5, 8869-8879
- https://doi.org/10.1109/access.2017.2694446
Abstract
With big data growth in biomedical and healthcare communities, accurate analysis of medical data benefits early disease detection, patient care, and community services. However, the analysis accuracy is reduced when the quality of medical data is incomplete. Moreover, different regions exhibit unique characteristics of certain regional diseases, which may weaken the prediction of disease outbreaks. In this paper, we streamline machine learning algorithms for effective prediction of chronic disease outbreak in disease-frequent communities. We experiment the modified prediction models over real-life hospital data collected from central China in 2013-2015. To overcome the difficulty of incomplete data, we use a latent factor model to reconstruct the missing data. We experiment on a regional chronic disease of cerebral infarction. We propose a new convolutional neural network (CNN)-based multimodal disease risk prediction algorithm using structured and unstructured data from hospital. To the best of our knowledge, none of the existing work focused on both data types in the area of medical big data analytics. Compared with several typical prediction algorithms, the prediction accuracy of our proposed algorithm reaches 94.8% with a convergence speed, which is faster than that of the CNN-based unimodal disease risk prediction algorithm.Keywords
Funding Information
- National Natural Science Foundation of China (61572220, 81671904)
- International Science and Technology Corporation Program of Chinese Ministry of Science and Technology (S2014ZR0340)
This publication has 28 references indexed in Scilit:
- Incorporating temporal EHR data in predictive models for risk stratification of renal function deteriorationJournal of Biomedical Informatics, 2015
- Data mining for censored time-to-event data: a Bayesian network model for predicting cardiovascular risk from electronic health record dataData Mining and Knowledge Discovery, 2014
- A relative similarity based method for interactive patient risk predictionData Mining and Knowledge Discovery, 2014
- Big Data In Health Care: Using Analytics To Identify And Manage High-Risk And High-Cost PatientsHealth Affairs, 2014
- Septic Shock Prediction for Patients with Missing DataACM Transactions on Management Information Systems, 2014
- Big Data: A SurveyMobile Networks and Applications, 2014
- HEART Score to Further Risk Stratify Patients With Low TIMI ScoresCritical Pathways in Cardiology: A Journal of Evidence-Based Medicine, 2013
- Mining electronic health records: towards better research applications and clinical careNature Reviews Genetics, 2012
- Cost minimization while satisfying hard/soft timing constraints for heterogeneous embedded systemsACM Transactions on Design Automation of Electronic Systems, 2009
- Risk factors and risk assessment tools for falls in hospital in-patients: a systematic reviewAge and Ageing, 2004