Application of ensemble clustering and survival tree analysis for identifying prognostic clinicogenomic features in patients with colorectal cancer from the 100,000 Genomes Project
Open Access
- 2 October 2021
- journal article
- research article
- Published by Springer Science and Business Media LLC in BMC Research Notes
- Vol. 14 (1), 1-7
- https://doi.org/10.1186/s13104-021-05789-0
Abstract
The objective of this study was to employ ensemble clustering and tree-based risk model approaches to identify interactions between clinicogenomic features for colorectal cancer using the 100,000 Genomes Project. Among the 2211 patients with colorectal cancer (mean age of diagnosis: 67.7; 59.7% male), 16.3%, 36.3%, 39.0% and 8.4% had stage 1, 2, 3 and 4 cancers, respectively. Almost every patient had surgery (99.7%), 47.4% had chemotherapy, 7.6% had radiotherapy and 1.4% had immunotherapy. On average, tumour mutational burden (TMB) was 18 mutations/Mb and 34.4%, 31.3% and 25.7% of patients had structural or copy number mutations in KRAS, BRAF and NRAS, respectively. In the fully adjusted Cox model, patients with advanced cancer [stage 3 hazard ratio (HR) = 3.2; p < 0.001; stage 4 HR = 10.2; p < 0.001] and those who had immunotherapy (HR = 1.8; p < 0.04) or radiotherapy (HR = 1.5; p < 0.02) treatment had a higher risk of dying. The ensemble clustering approach generated four distinct clusters where patients in cluster 2 had the best survival outcomes (1-year: 98.7%; 2-year: 96.7%; 3-year: 93.0%) while patients in cluster 3 (1-year: 87.9; 2-year: 70.0%; 3-year: 53.1%) had the worst outcomes. Kaplan–Meier analysis and log rank test revealed that the clusters were separated into distinct prognostic groups (p < 0.0001). Survival tree or recursive partitioning analyses were performed to further explore risk groups within each cluster. Among patients in cluster 2, for example, interactions between cancer stage, grade, radiotherapy, TMB, BRAF mutation status were identified. Patients with stage 4 cancer and TMB ≥ 1.6 mutations/Mb had 4 times higher risk of dying relative to the baseline hazard in that cluster.Keywords
Funding Information
- Wellcome Trust (204841/Z/16/Z)
This publication has 23 references indexed in Scilit:
- A Pan-Cancer Analysis of Enhancer Expression in Nearly 9000 Patient SamplesCell, 2018
- An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome AnalyticsCell, 2018
- Integrated Genomic Analysis of the Ubiquitin Pathway across Cancer TypesCell Reports, 2018
- Application of survival tree analysis for exploration of potential interactions between predictors of incident chronic kidney disease: a 15-year follow-up studyJournal of Translational Medicine, 2017
- Availability of evidence of benefits on overall survival and quality of life of cancer drugs approved by European Medicines Agency: retrospective cohort study of drug approvals 2009-13BMJ, 2017
- Adjuvant effects of a sequence-engineered mRNA vaccine: translational profiling demonstrates similar human and murine innate responseJournal of Translational Medicine, 2017
- Canvas: versatile and scalable detection of copy number variantsBioinformatics, 2016
- Manta: rapid detection of structural variants and indels for germline and cancer sequencing applicationsBioinformatics, 2015
- Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray DataMachine Learning, 2003