Deep learning to identify a gene signature associated with molecular subtypes that predicts prognosis in colorectal cancer.

Abstract

Background: Identifying robust prognostic risk groups of colorectal cancer (CRC) will significantly improve patients’ outcomes. However, CRC has been demonstrated to be molecularly heterogeneous which affected clinical decision-making. Recently, a comprehensive study proposed four consensus molecular subtypes (CMSs) of CRC with a comprehensive biological and clinical characterization, but a cost-effective clinical assay for prognosis is lacking. To fill this gap, we present a supervised framework using deep learning to identify CMS associated gene signature for prognosis. Methods: A total of 1,729 CRC patients with complete follow-up data were included in this study. We first applied a supervised deep learning-based framework in the training cohort ( n = 624) to extract the CMS-associated deep features and then identified a gene panel highly correlated to these deep features. Subsequently, the prognostic power of this gene signature was evaluated on 6 independent CRC datasets. Results: We identified a 21-gene signature associated CMS subtypes and a trained risk model significantly predicted patients’ disease-free survival (DFS) on six independent CRC datasets ( n = 1,729): Training cohort ( n = 624, HR = 2.53, 95% CI: 1.53-4.18, P < 0.001), Validation 1 cohort ( n = 557, HR = 1.77, 95%CI: 1.27 – 2.47, P < 0.001) and Validation 2 cohort merged by other four datasets ( n = 548, HR = 2.10, 95%CI: 1.50 – 2.93, P < 0.001). Especially, this 21-gene signature can also stratify stage 2 and 3 patients into distinct survival groups: Training cohort ( n = 338, HR = 2.14, 95%CI: 1.18-3.85, P < 0.01), Validation 1 cohort ( n = 457, HR = 1.63, 95%CI: 1.12 – 2.37, P < 0.01) and Validation 2 cohort ( n = 437, HR = 1.73, 95%CI: 1.37 – 2.82, P < 0.001), outperformed Oncotype DX on the same cohorts. Conclusions: To summarize, using our DL-based framework, we successfully developed a CMS-associated gene signature for robust prognostic prediction in CRC. Compared with genome-wide expression profile-based CMS classification system, the 21-gene panel can be easily deployed in clinical practice to facilitate decision making.

Publication
Journal of Clinical Oncology
Du CAI
Du CAI
Postdoc

I focus on leveraging explainable AI and large foundation models to advance medical imaging and digital pathology in colorectal cancer research.

De-Jun FAN
De-Jun FAN
Associate Professor

My research explores the intersection of gastrointestinal endoscopy (GIE) and artificial intelligence (AI), along with the biological mechanisms of colorectal cancer development.

Cheng-Hang LI
Cheng-Hang LI
Research Assistant
Ze-Ping HUANG
Ze-Ping HUANG
Medical Student
Qi-Qi ZHU
Qi-Qi ZHU
Surgeon
Min-Yi LV
Min-Yi LV
PhD Student
Chu-Ling HU
Chu-Ling HU
PhD Student
Xin DUAN
Xin DUAN
Postdoc
Feng GAO
Feng GAO
Professor

My research leverages AI and big data to improve diagnostics, prognostics, and ultimately, outcomes in cancer and other biomedical fields.