Deep learning to identify a gene signature associated with molecular subtypes that predicts prognosis in colorectal cancer.

Abstract

Background: Identifying robust prognostic risk groups of colorectal cancer (CRC) will significantly improve patients’ outcomes. However, CRC has been demonstrated to be molecularly heterogeneous which affected clinical decision-making. Recently, a comprehensive study proposed four consensus molecular subtypes (CMSs) of CRC with a comprehensive biological and clinical characterization, but a cost-effective clinical assay for prognosis is lacking. To fill this gap, we present a supervised framework using deep learning to identify CMS associated gene signature for prognosis. Methods: A total of 1,729 CRC patients with complete follow-up data were included in this study. We first applied a supervised deep learning-based framework in the training cohort ( n = 624) to extract the CMS-associated deep features and then identified a gene panel highly correlated to these deep features. Subsequently, the prognostic power of this gene signature was evaluated on 6 independent CRC datasets. Results: We identified a 21-gene signature associated CMS subtypes and a trained risk model significantly predicted patients’ disease-free survival (DFS) on six independent CRC datasets ( n = 1,729): Training cohort ( n = 624, HR = 2.53, 95% CI: 1.53-4.18, P < 0.001), Validation 1 cohort ( n = 557, HR = 1.77, 95%CI: 1.27 – 2.47, P < 0.001) and Validation 2 cohort merged by other four datasets ( n = 548, HR = 2.10, 95%CI: 1.50 – 2.93, P < 0.001). Especially, this 21-gene signature can also stratify stage 2 and 3 patients into distinct survival groups: Training cohort ( n = 338, HR = 2.14, 95%CI: 1.18-3.85, P < 0.01), Validation 1 cohort ( n = 457, HR = 1.63, 95%CI: 1.12 – 2.37, P < 0.01) and Validation 2 cohort ( n = 437, HR = 1.73, 95%CI: 1.37 – 2.82, P < 0.001), outperformed Oncotype DX on the same cohorts. Conclusions: To summarize, using our DL-based framework, we successfully developed a CMS-associated gene signature for robust prognostic prediction in CRC. Compared with genome-wide expression profile-based CMS classification system, the 21-gene panel can be easily deployed in clinical practice to facilitate decision making.

Publication
Journal of Clinical Oncology