Radiomics analysis combining unsupervised learning and handcrafted features: A multiple‐disease study

Abstract
Purpose To study and investigate the synergistic benefit of incorporating both conventional handcrafted and learning-based features in disease identification across a wide range of clinical setups. Methods and Materials In this retrospective study, we collected 170/150/209/137 patients with four different disease types associated with identification objectives of: lymph node metastasis status of gastric cancer (GC), 5-year survival status of patients with high-grade osteosarcoma (HOS), early recurrence status of intrahepatic cholangiocarcinoma (ICC), and pathological grades of pancreatic neuroendocrine tumors (pNETs). CT and MR were used to derive image features for GC/HOS/pNETs and ICC respectively. In each study, 67 universal handcrafted features and study-specific features based on sparse autoencoder (SAE) method were extracted and fed into the subsequent feature selection and learning model to predict the corresponding disease identification. Models using handcrafted alone, SAE alone, and hybrid features were optimized and their performance was compared. Prominent features were analyzed both qualitatively and quantitatively to generate study-specific and cross-study insight. In addition to direct performance gain assessment, correlation analysis was performed to assess the complementarity between handcrafted features and SAE features. Results On the independent hold-off test, the handcrafted, SAE, and hybrid features based prediction yielded AUC of 0.761 vs 0.769 vs 0.829 for GC, 0.629 vs 0.740 vs 0.709 for HOS, 0.717 vs 0.718 vs 0.758 for ICC, and 0.739 vs 0.715 vs 0.771 for pNETs studies respectively. In three out of the four studies, prediction using the hybrid features yields the best performance, demonstrating the general benefit in using hybrid features. Prediction with SAE features alone had the best performance in the HOS study, which may be explained by the complexity of HOS prognosis and the possibility of a slight overfit due to higher correlation between handcrafted and SAE features. Conclusion This study demonstrated the general benefit of combing handcrafted and learning-based features in radiomics modelling. It also clearly illustrates the task-specific and data-specific dependency on the performance gain and suggests that while the common methodology of feature combination may be applied across various studies and tasks, study-specific feature selection and model optimization is still necessary to achieve high accuracy and robustness. This article is protected by copyright. All rights reserved
Funding Information
  • National Natural Science Foundation of China (81871351, 81950410632)