Potential Biases in Machine Learning Algorithms Using Electronic Health Record Data

Top Cited Papers

1 November 2018

journal article
research article
Published by American Medical Association (AMA) in JAMA Internal Medicine

Vol. 178 (11), 1544-1547
https://doi.org/10.1001/jamainternmed.2018.3763

Abstract

A promise of machine learning in health care is the avoidance of biases in diagnosis and treatment; a computer algorithm could objectively synthesize and interpret the data in the medical record. Integration of machine learning with clinical decision support tools, such as computerized alerts or diagnostic support, may offer physicians and others who provide health care targeted and timely information that can improve clinical decisions. Machine learning algorithms, however, may also be subject to biases. The biases include those related to missing data and patients not identified by algorithms, sample size and underestimation, and misclassification and measurement error. There is concern that biases and deficiencies in the data used by machine learning algorithms may contribute to socioeconomic disparities in health care. This Special Communication outlines the potential biases that may be introduced into machine learning–based clinical decision support tools that use electronic health record data and proposes potential solutions to the problems of overreliance on automation, algorithms based on biased data, and algorithms that do not provide information that is clinically meaningful. Existing health care disparities should not be amplified by thoughtless or excessive reliance on machines. Identify all potential conflicts of interest that might be relevant to your comment. Conflicts of interest comprise financial interests, activities, and relationships within the past 3 years including but not limited to employment, affiliation, grants or funding, consultancies, honoraria or payment, speaker's bureaus, stock ownership or options, expert testimony, royalties, donation of medical equipment, or patents planned, pending, or issued. Err on the side of full disclosure. If you have no conflicts of interest, check "No potential conflicts of interest" in the box below. The information will be posted with your response. Not all submitted comments are published. Please see our commenting policy for details.

Keywords

This publication has 20 references indexed in Scilit:

Semantics derived automatically from language corpora contain human-like biases
Science, 2017
How Socioeconomic Status Affects Patient Perceptions of Health Care: A Qualitative Study
Journal of Primary Care & Community Health, 2017
Data On Race, Ethnicity, And Language Largely Incomplete For Managed Care Plan Members
Health Affairs, 2017
Genetic Misdiagnoses and the Potential for Health Disparities
The New England Journal of Medicine, 2016
Comparative analyses of population-scale phenomic data in electronic medical records reveal race-specific disease networks
Bioinformatics, 2016
Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review
Journal of the American Medical Informatics Association, 2016
Guest Editorial Deep Learning in Medical Imaging: Overview and Future Promise of an Exciting New Technique
IEEE Transactions on Medical Imaging, 2016
Sex and Race/Ethnicity–Related Disparities in Care and Outcomes After Hospitalization for Coronary Artery Disease Among Older Adults
Circulation: Cardiovascular Quality and Outcomes, 2016
Potentially missed detection with screening mammography: does the quality of radiologist's interpretation vary by patient socioeconomic advantage/disadvantage?
Annals of Epidemiology, 2013
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
ACM SIGKDD Explorations Newsletter, 2004

Cited by 711 articles