Multi-institutional comparative effectiveness of advanced cancer longitudinal imaging response evaluation methods: Current practice versus artificial intelligence-assisted.

Abstract
2010 Background: Current-practice methods to evaluate advanced cancer longitudinal tumor response include manual measurements on digital medical images and dictation of text-based reports that are prone to errors, inefficient, and associated with low inter-observer agreement. The purpose of this study is to compare the effectiveness of advanced cancer longitudinal imaging response evaluation using current practice versus artificial intelligence (AI)-assisted methods. Methods: For this multi-institutional longitudinal retrospective study, body CT images from 120 consecutive patients with multiple serial imaging exams and advanced cancer treated with systemic therapy were independently evaluated by 24 radiologists using current-practice versus AI-assisted methods. For the current practice method, radiologists dictated text-based reports and separately categorized response (CR, PR, SD, and PD). For the AI-assisted method, custom software included AI algorithms for tumor measurement, target and non-target location labelling, and tumor localization at follow up. The AI-assisted software automatically categorized tumor response per RECIST 1.1 calculations and displayed longitudinal data in the form of a graph, table, and key images. All studies were read independently in triplicate for assessment of inter-observer agreement. Comparative effectiveness metrics included: major errors, time of image interpretation, and inter-observer agreement for final response category. Results: Major errors were found in 27.5% (99/360) for current-practice versus 0.3% (1/360) for AI-assisted methods (p < 0.001), corresponding to a 99% reduction in major errors. Average time of interpretation by radiologists was 18.7 min for current-practice versus 9.8 min for AI-assisted method (p < 0.001), with the AI-assisted method being nearly twice as fast. Total inter-observer agreement on final response categorization for radiologists was 52% (62/120) for current-practice versus 75% (90/120) for AI-assisted method (p < 0.001), corresponding to a 45% increase in total inter-observer agreement. Conclusion: In a large multi-institutional study, AI-assisted advanced cancer longitudinal imaging response evaluation significantly reduced major errors, was nearly twice as fast, and increased inter-observer agreement relative to the current-practice method, thereby establishing a new and improved standard of care.
Funding Information
  • None