Machine learning in medicine: a practical introduction

Top Cited Papers

Open Access

19 March 2019

journal article
research article
Published by Springer Science and Business Media LLC in BMC Medical Research Methodology

Vol. 19 (1), 1-18
https://doi.org/10.1186/s12874-019-0681-4

Abstract

BackgroundFollowing visible successes on a wide range of predictive tasks, machine learning techniques are attracting substantial interest from medical researchers and clinicians. We address the need for capacity development in this area by providing a conceptual introduction to machine learning alongside a practical guide to developing and evaluating predictive algorithms using freely-available open source software and public domain data.MethodsWe demonstrate the use of machine learning techniques by developing three predictive models for cancer diagnosis using descriptions of nuclei sampled from breast masses. These algorithms include regularized General Linear Model regression (GLMs), Support Vector Machines (SVMs) with a radial basis function kernel, and single-layer Artificial Neural Networks. The publicly-available dataset describing the breast mass samples (N=683) was randomly split into evaluation (n=456) and validation (n=227) samples.We trained algorithms on data from the evaluation sample before they were used to predict the diagnostic outcome in the validation dataset. We compared the predictions made on the validation datasets with the real-world diagnostic decisions to calculate the accuracy, sensitivity, and specificity of the three models. We explored the use of averaging and voting ensembles to improve predictive performance. We provide a step-by-step guide to developing algorithms using the open-source R statistical programming environment.ResultsThe trained algorithms were able to classify cell nuclei with high accuracy (.94 -.96), sensitivity (.97 -.99), and specificity (.85 -.94). Maximum accuracy (.96) and area under the curve (.97) was achieved using the SVM algorithm. Prediction performance increased marginally (accuracy =.97, sensitivity =.99, specificity =.95) when algorithms were arranged into a voting ensemble.ConclusionsWe use a straightforward example to demonstrate the theory and practice of machine learning for clinicians and medical researchers. The principals which we demonstrate here can be readily applied to other complex tasks including natural language processing and image recognition.

Keywords

Funding Information

Research Trainees Coordinating Centre (CDF-2017-10-019)
Research Trainees Coordinating Centre (PDF-2014-07-28)

This publication has 29 references indexed in Scilit:

Use of Sentiment Analysis for Capturing Patient Experience From Free-Text Comments Posted Online
Journal of Medical Internet Research, 2013
Private traits and attributes are predictable from digital records of human behavior
Proceedings of the National Academy of Sciences of the United States of America, 2013
Automated identification of extreme-risk events in clinical incident reports
Journal of the American Medical Informatics Association, 2012
Achieving a Nationwide Learning Health System
Science Translational Medicine, 2010
Race and Insurance Status as Risk Factors for Trauma Mortality
Archives of Surgery, 2008
Regularization and Variable Selection Via the Elastic Net
Journal of the Royal Statistical Society Series B: Statistical Methodology, 2005
Introduction
Published by Springer Science and Business Media LLC ,2001
Support-vector networks
Machine Learning, 1995
Machine learning techniques to diagnose breast cancer from image-processed nuclear features of fine needle aspirates
Cancer Letters, 1994
STATISTICAL METHODS FOR ASSESSING AGREEMENT BETWEEN TWO METHODS OF CLINICAL MEASUREMENT
The Lancet, 1986

Cited by 668 articles