Use of radiomics based on 18F-FDG PET/CT and machine learning methods to aid clinical decision-making in the classification of solitary pulmonary lesions: an innovative approach

Abstract
Purpose This study was designed and performed to assess the ability of 18F-fluorodeoxyglucose (FDG) positron emission tomography (PET) and computed tomography (CT) radiomics features combined with machine learning methods to differentiate between primary and metastatic lung lesions and to classify histological subtypes. Moreover, we identified the optimal machine learning method. Methods A total of 769 patients pathologically diagnosed with primary or metastatic lung cancers were enrolled. We used the LIFEx package to extract radiological features from semiautomatically segmented PET and CT images within the same volume of interest. Patients were randomly distributed in training and validation sets. Through the evaluation of five feature selection methods and nine classification methods, discriminant models were established. The robustness of the procedure was controlled by tenfold cross-validation. The model’s performance was evaluated using the area under the receiver operating characteristic curve (AUC). Results Based on the radiomics features extracted from PET and CT images, forty-five discriminative models were established. Combined with appropriate feature selection methods, most classifiers showed excellent discriminative ability with AUCs greater than 0.75. In the differentiation between primary and metastatic lung lesions, the feature selection method gradient boosting decision tree (GBDT) combined with the classifier GBDT achieved the highest classification AUC of 0.983 in the PET dataset. In contrast, the feature selection method eXtreme gradient boosting combined with the classifier random forest (RF) achieved the highest AUC of 0.828 in the CT dataset. In the discrimination between squamous cell carcinoma and adenocarcinoma, the combination of GBDT feature selection method with GBDT classification had the highest AUC of 0.897 in the PET dataset. In contrast, the combination of the GBDT feature selection method with the RF classification had the highest AUC of 0.839 in the CT dataset. Most of the decision tree (DT)-based models were overfitted, suggesting that the classification method was not appropriate for practical application. Conclusion 18F-FDG PET/CT radiomics features combined with machine learning methods can distinguish between primary and metastatic lung lesions and identify histological subtypes in lung cancer. GBDT and RF were considered optimal classification methods for the PET and CT datasets, respectively, and GBDT was considered the optimal feature selection method in our analysis.
Funding Information
  • National Natural Science Foundation of China (81971653)
  • Ministry of Science and Technology (2019YFS0373)