Multicenter, Head-to-Head, Real-World Validation Study of Seven Automated Artificial Intelligence Diabetic Retinopathy Screening Systems

Top Cited Papers

4 January 2021

journal article
research article
Published by American Diabetes Association in Diabetes Care

Vol. 44 (5), 1168-1175
https://doi.org/10.2337/dc20-1877

Abstract

OBJECTIVE With rising global prevalence of diabetic retinopathy (DR), automated DR screening is needed for primary care settings. Two automated artificial intelligence (AI)-based DR screening algorithms have U.S. Food and Drug Administration (FDA) approval. Several others are under consideration while in clinical use in other countries, but their real-world performance has not been evaluated systematically. We compared the performance of seven automated AI-based DR screening algorithms (including one FDA-approved algorithm) against human graders when analyzing real-world retinal imaging data. RESEARCH DESIGN AND METHODS This was a multicenter, noninterventional device validation study evaluating a total of 311,604 retinal images from 23,724 veterans who presented for teleretinal DR screening at the Veterans Affairs (VA) Puget Sound Health Care System (HCS) or Atlanta VA HCS from 2006 to 2018. Five companies provided seven algorithms, including one with FDA approval, that independently analyzed all scans, regardless of image quality. The sensitivity/specificity of each algorithm when classifying images as referable DR or not were compared with original VA teleretinal grades and a regraded arbitrated data set. Value per encounter was estimated. RESULTS Although high negative predictive values (82.72-93.69%) were observed, sensitivities varied widely (50.98-85.90%). Most algorithms performed no better than humans against the arbitrated data set, but two achieved higher sensitivities, and one yielded comparable sensitivity (80.47%, P = 0.441) and specificity (81.28%, P = 0.195). Notably, one had lower sensitivity (74.42%) for proliferative DR (P = 9.77 x 10(-4)) than the VA teleretinal graders. Value per encounter varied at $15.14-$18.06 for ophthalmologists and $7.74-$9.24 for optometrists. CONCLUSIONS The DR screening algorithms showed significant performance differences. These results argue for rigorous testing of all such algorithms on real-world data before clinical implementation.

Funding Information

National Eye Institute (K23-EY-029246, R01-AG-060942)
Research to Prevent Blindness

This publication has 30 references indexed in Scilit:

Automated Diabetic Retinopathy Image Assessment Software
Ophthalmology, 2017
Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs
JAMA, 2016
Improved Automated Detection of Diabetic Retinopathy on a Publicly Available Dataset Through Integration of Deep Learning
Investigative Ophthalmology & Visual Science, 2016
Machine Learning Approaches for Detecting Diabetic Retinopathy from Clinical and Public Health Records.
2015
Epidemiology of diabetic retinopathy, diabetic macular edema and related vision loss
Eye and Vision, 2015
A comparison of the causes of blindness certifications in England and Wales in working age adults (16–64 years), 1999–2000 with 2009–2010
BMJ Open, 2014
Evaluation of Telemedicine for Screening of Diabetic Retinopathy in the Veterans Health Administration
Ophthalmology, 2013
The number of ophthalmologists in practice and training worldwide: a growing gap despite more than 200 000 practitioners
British Journal of Ophthalmology, 2012
Framework for a national teleretinal imaging program to screen for diabetic retinopathy in Veterans Health Administration patients
Journal of Rehabilitation Research and Development, 2006
Racial differences in pigmentation of the Fundus oculi
Psychonomic Science, 1967

Cited by 90 articles