A systematic literature review of machine learning in online personal health data

Open Access

25 March 2019

journal article
review article
Published by Oxford University Press (OUP) in Journal of the American Medical Informatics Association

Vol. 26 (6), 561-576
https://doi.org/10.1093/jamia/ocz009

Abstract

User-generated content (UGC) in online environments provides opportunities to learn an individual’s health status outside of clinical settings. However, the nature of UGC brings challenges in both data collecting and processing. The purpose of this study is to systematically review the effectiveness of applying machine learning (ML) methodologies to UGC for personal health investigations. We searched PubMed, Web of Science, IEEE Library, ACM library, AAAI library, and the ACL anthology. We focused on research articles that were published in English and in peer-reviewed journals or conference proceedings between 2010 and 2018. Publications that applied ML to UGC with a focus on personal health were identified for further systematic review. We identified 103 eligible studies which we summarized with respect to 5 research categories, 3 data collection strategies, 3 gold standard dataset creation methods, and 4 types of features applied in ML models. Popular off-the-shelf ML models were logistic regression (n = 22), support vector machines (n = 18), naive Bayes (n = 17), ensemble learning (n = 12), and deep learning (n = 11). The most investigated problems were mental health (n = 39) and cancer (n = 15). Common health-related aspects extracted from UGC were treatment experience, sentiments and emotions, coping strategies, and social support. The systematic review indicated that ML can be effectively applied to UGC in facilitating the description and inference of personal health. Future research needs to focus on mitigating bias introduced when building study cohorts, creating features from free text, improving clinical creditability of UGC, and model interpretability.

Funding Information

National Science Foundation (IIS1418504)

This publication has 98 references indexed in Scilit:

The Impact of Social Media on the Sexual and Social Wellness of Adolescents
Journal of Pediatric and Adolescent Gynecology, 2015
Social Media and Internet‐Based Data in Global Systems for Public Health Surveillance: A Systematic Review
The Milbank Quarterly, 2014
Empowering patients through social media: The benefits and challenges
Health Informatics Journal, 2014
Are Health Behavior Change Interventions That Use Online Social Networks Effective? A Systematic Review
Journal of Medical Internet Research, 2014
Clinical Benefits of Electronic Health Record Use: National Findings
Health Services Research, 2013
Patient-generated secure messages and eVisits on a patient portal: are patients at risk?
Journal of the American Medical Informatics Association, 2013
Automatic topic identification of health-related messages in online health community using text classification
SpringerPlus, 2013
Ethical issues in using social media for health and health care research
Health Policy, 2013
A New Dimension of Health Care: Systematic Review of the Uses, Benefits, and Limitations of Social Media for Health Communication
Journal of Medical Internet Research, 2013
Health-Related Hot Topic Detection in Online Communities Using Text Clustering
PLOS ONE, 2013

Cited by 58 articles