High-Risk Breast Lesions: A Machine Learning Model to Predict Pathologic Upgrade and Reduce Unnecessary Surgical Excision

Abstract
To develop a machine learning model that allows high-risk breast lesions (HRLs) diagnosed with image-guided needle biopsy that require surgical excision to be distinguished from HRLs that are at low risk for upgrade to cancer at surgery and thus could be surveilled. Consecutive patients with biopsy-proven HRLs who underwent surgery or at least 2 years of imaging follow-up from June 2006 to April 2015 were identified. A random forest machine learning model was developed to identify HRLs at low risk for upgrade to cancer. Traditional features such as age and HRL histologic results were used in the model, as were text features from the biopsy pathologic report. One thousand six HRLs were identified, with a cancer upgrade rate of 11.4% (115 of 1006). A machine learning random forest model was developed with 671 HRLs and tested with an independent set of 335 HRLs. Among the most important traditional features were age and HRL histologic results (eg, atypical ductal hyperplasia). An important text feature from the pathologic reports was “severely atypical.” Instead of surgical excision of all HRLs, if those categorized with the model to be at low risk for upgrade were surveilled and the remainder were excised, then 97.4% (37 of 38) of malignancies would have been diagnosed at surgery, and 30.6% (91 of 297) of surgeries of benign lesions could have been avoided. This study provides proof of concept that a machine learning model can be applied to predict the risk of upgrade of HRLs to cancer. Use of this model could decrease unnecessary surgery by nearly one-third and could help guide clinical decision making with regard to surveillance versus surgical excision of HRLs. © RSNA, 2017