A systematic literature review of machine learning in online personal health data
Open Access
- 25 March 2019
- journal article
- review article
- Published by Oxford University Press (OUP) in Journal of the American Medical Informatics Association
- Vol. 26 (6), 561-576
- https://doi.org/10.1093/jamia/ocz009
Abstract
User-generated content (UGC) in online environments provides opportunities to learn an individual’s health status outside of clinical settings. However, the nature of UGC brings challenges in both data collecting and processing. The purpose of this study is to systematically review the effectiveness of applying machine learning (ML) methodologies to UGC for personal health investigations. We searched PubMed, Web of Science, IEEE Library, ACM library, AAAI library, and the ACL anthology. We focused on research articles that were published in English and in peer-reviewed journals or conference proceedings between 2010 and 2018. Publications that applied ML to UGC with a focus on personal health were identified for further systematic review. We identified 103 eligible studies which we summarized with respect to 5 research categories, 3 data collection strategies, 3 gold standard dataset creation methods, and 4 types of features applied in ML models. Popular off-the-shelf ML models were logistic regression (n = 22), support vector machines (n = 18), naive Bayes (n = 17), ensemble learning (n = 12), and deep learning (n = 11). The most investigated problems were mental health (n = 39) and cancer (n = 15). Common health-related aspects extracted from UGC were treatment experience, sentiments and emotions, coping strategies, and social support. The systematic review indicated that ML can be effectively applied to UGC in facilitating the description and inference of personal health. Future research needs to focus on mitigating bias introduced when building study cohorts, creating features from free text, improving clinical creditability of UGC, and model interpretability.Funding Information
- National Science Foundation (IIS1418504)
This publication has 98 references indexed in Scilit:
- The Impact of Social Media on the Sexual and Social Wellness of AdolescentsJournal of Pediatric and Adolescent Gynecology, 2015
- Social Media and Internet‐Based Data in Global Systems for Public Health Surveillance: A Systematic ReviewThe Milbank Quarterly, 2014
- Empowering patients through social media: The benefits and challengesHealth Informatics Journal, 2014
- Are Health Behavior Change Interventions That Use Online Social Networks Effective? A Systematic ReviewJournal of Medical Internet Research, 2014
- Clinical Benefits of Electronic Health Record Use: National FindingsHealth Services Research, 2013
- Patient-generated secure messages and eVisits on a patient portal: are patients at risk?Journal of the American Medical Informatics Association, 2013
- Automatic topic identification of health-related messages in online health community using text classificationSpringerPlus, 2013
- Ethical issues in using social media for health and health care researchHealth Policy, 2013
- A New Dimension of Health Care: Systematic Review of the Uses, Benefits, and Limitations of Social Media for Health CommunicationJournal of Medical Internet Research, 2013
- Health-Related Hot Topic Detection in Online Communities Using Text ClusteringPLOS ONE, 2013