Application of ensemble clustering and survival tree analysis for identifying prognostic clinicogenomic features in patients with colorectal cancer from the 100,000 Genomes Project

Abstract
The objective of this study was to employ ensemble clustering and tree-based risk model approaches to identify interactions between clinicogenomic features for colorectal cancer using the 100,000 Genomes Project. Among the 2211 patients with colorectal cancer (mean age of diagnosis: 67.7; 59.7% male), 16.3%, 36.3%, 39.0% and 8.4% had stage 1, 2, 3 and 4 cancers, respectively. Almost every patient had surgery (99.7%), 47.4% had chemotherapy, 7.6% had radiotherapy and 1.4% had immunotherapy. On average, tumour mutational burden (TMB) was 18 mutations/Mb and 34.4%, 31.3% and 25.7% of patients had structural or copy number mutations in KRAS, BRAF and NRAS, respectively. In the fully adjusted Cox model, patients with advanced cancer [stage 3 hazard ratio (HR) = 3.2; p < 0.001; stage 4 HR = 10.2; p < 0.001] and those who had immunotherapy (HR = 1.8; p < 0.04) or radiotherapy (HR = 1.5; p < 0.02) treatment had a higher risk of dying. The ensemble clustering approach generated four distinct clusters where patients in cluster 2 had the best survival outcomes (1-year: 98.7%; 2-year: 96.7%; 3-year: 93.0%) while patients in cluster 3 (1-year: 87.9; 2-year: 70.0%; 3-year: 53.1%) had the worst outcomes. Kaplan–Meier analysis and log rank test revealed that the clusters were separated into distinct prognostic groups (p < 0.0001). Survival tree or recursive partitioning analyses were performed to further explore risk groups within each cluster. Among patients in cluster 2, for example, interactions between cancer stage, grade, radiotherapy, TMB, BRAF mutation status were identified. Patients with stage 4 cancer and TMB ≥ 1.6 mutations/Mb had 4 times higher risk of dying relative to the baseline hazard in that cluster.
Funding Information
  • Wellcome Trust (204841/Z/16/Z)