ENRICHing Medical Imaging Training Sets Enables More Efficient Machine Learning
Preprint
- 25 May 2021
- preprint
- research article
- Published by Cold Spring Harbor Laboratory
Abstract
Objective: Deep learning (DL) has been applied in proofs of concept across biomedical imaging, including across modalities and medical specialties1–17. Labeled data is critical to training and testing DL models, but human expert labelers are limited. In addition, DL traditionally requires copious training data, which is computationally expensive to process and iterate over. Consequently, it is useful to prioritize using those images that are most likely to improve a model’s performance, a practice known as instance selection. The challenge is determining how best to prioritize. It is natural to prefer straightforward, robust, quantitative metrics as the basis for prioritization for instance selection. However, in current practice such metrics are not tailored to, and almost never used for, image datasets.Methods: To address this problem, we introduce ENRICH—EliminateNoise andRedundancy for ImagingChallenges—a customizable method that prioritizes images based on how much diversity each image adds to the training set.Results: First, we show that medical datasets are special in that in general each image adds less diversity than in non-medical datasets. Next, we demonstrate that ENRICH achieves nearly maximal performance on classification and segmentation tasks on several medical image datasets using only a fraction of the available images and outperforms random image selection, the negative control. Finally, we show that ENRICH can also be used to identify errors and outliers in imaging datasets.Conclusion: ENRICH is a simple, computationally efficient method for prioritizing images for expert labeling and use in DL.Keywords
This publication has 27 references indexed in Scilit:
- Fast and accurate view classification of echocardiograms using deep learningnpj Digital Medicine, 2018
- Dermatologist-level classification of skin cancer with deep neural networksNature, 2017
- Learning how to Active Learn: A Deep Reinforcement Learning ApproachPublished by Association for Computational Linguistics (ACL) ,2017
- Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus PhotographsJAMA, 2016
- Cost-Effective Active Learning for Deep Image ClassificationIEEE Transactions on Circuits and Systems for Video Technology, 2016
- Robust estimates of overall immune-repertoire diversity from high-throughput measurements on samplesNature Communications, 2016
- The U.S. Radiologist Workforce: An Analysis of Temporal and Geographic Variation by Using Large National DatasetsRadiology, 2016
- A review of instance selection methodsArtificial Intelligence Review, 2010
- Multi-class active learning for image classificationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2009
- Improving generalization with active learningMachine Learning, 1994