Predicting biomarkers from classifier for liver metastasis of colorectal adenocarcinomas using machine learning models
Top Cited Papers
Open Access
- 24 July 2020
- journal article
- research article
- Published by Wiley in Cancer Medicine
- Vol. 9 (18), 6667-6678
- https://doi.org/10.1002/cam4.3289
Abstract
Background Early diagnosis of liver metastasis is of great importance for enhancing the survival of colorectal adenocarcinoma (CAD) patients, and the combined use of a single biomarker in a classier model has shown great improvement in predicting the metastasis of several types of cancers. However, it is little reported for CAD. This study therefore aimed to screen an optimal classier model of CAD with liver metastasis and explore the metastatic mechanisms of genes when applying this classier model. Methods The differentially expressed genes between primary CAD samples and CAD with metastasis samples were screened from the Moffitt Cancer Center (MCC) dataset GSE131418. The classification performances of six selected algorithms, namely, LR, RF, SVM, GBDT, NN, and CatBoost, for classification of CAD with liver metastasis samples were compared using the MCC dataset GSE131418 by detecting their classification test accuracy. In addition, the consortium datasets of GSE131418 and GSE81558 were used as internal and external validation sets to screen the optimal method. Subsequently, functional analyses and a drug‐targeted network construction of the feature genes when applying the optimal method were conducted. Results The optimal CatBoost model with the highest accuracy of 99%, and an area under the curve of 1, was screened, which consisted of 33 feature genes. A functional analysis showed that the feature genes were closely associated with a “steroid metabolic process” and “lipoprotein particle receptor binding” (eg APOB and APOC3). In addition, the feature genes were significantly enriched in the “complement and coagulation cascade” pathways (eg FGA, F2, and F9). In a drug‐target interaction network, F2 and F9 were predicted as targets of menadione. Conclusion The CatBoost model constructed using 33 feature genes showed the optimal classification performance for identifying CAD with liver metastasis.Keywords
This publication has 50 references indexed in Scilit:
- Oral cancer prognosis based on clinicopathologic and genomic markers using a hybrid of feature selection and machine learning methodsBMC Bioinformatics, 2013
- clusterProfiler: an R Package for Comparing Biological Themes Among Gene ClustersOMICS: A Journal of Integrative Biology, 2012
- The tumor-suppressor gene Nkx2.8 suppresses bladder cancer proliferation through upregulation of FOXO3a and inhibition of the MEK/ERK signaling pathwayCarcinogenesis: Integrative Cancer Research, 2012
- NKX2-3 Transcriptional Regulation of Endothelin-1 and VEGF Signaling in Human Intestinal Microvascular Endothelial CellsPLOS ONE, 2011
- Genes Regulated by Nkx2-3 in Sporadic and Inflammatory Bowel Disease-Associated Colorectal Cancer Cell LinesDigestive Diseases and Sciences, 2010
- Interfacial Properties of a Complex Multi-Domain 490 Amino Acid Peptide Derived from Apolipoprotein B (Residues 292−782)Langmuir, 2009
- GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductorBioinformatics, 2007
- An introduction to ROC analysisPattern Recognition Letters, 2005
- affy—analysis of Affymetrix GeneChip data at the probe levelBioinformatics, 2004
- Identification of the Phospholipid Binding Site in the Vitamin K-dependent Blood Coagulation Protein Factor IXOnline Journal of Public Health Informatics, 1996