Integrating diffusion components of multi-omics datasets with application to cancer molecular subtyping

Abstract

Cancer is a heterogeneous disease and consists of multiple molecular subtypes underlying the diverse clinical outcomes. Most strategies for cancer molecular subtyping are mainly based on unsupervised classification of single transcriptome data, especially gene expression profiles. However, molecular heterogeneity also exists on other genetic or epigenetic levels. For a more comprehensive analysis of cancer heterogeneity, multi-omics data integration provides a more effective solution. Here, we propose DMCI, which integrates the first diffusion component of multi-omics datasets into a joint variable, combining K-means to dissect the cancer heterogeneity. Diffusion map is a spectral non-linear dimension method where the first diffusion component accounting for the largest importance of dimension. The joint variable learning from our DMCI not only captures the complementary information from different data sources but also is more computational efficiency. To demonstrate the effectiveness, we applied DMCI for colorectal cancer and ovarian cancer subtyping, comparing with other data integration methods, our approach showed much better performance and identified molecular subtypes that are much more clinically relevant.

Publication
Intelligent Systems for Molecular Biology
Xin DUAN
Xin DUAN
Postdoc
Du CAI
Du CAI
Postdoc

I focus on leveraging explainable AI and large foundation models to advance medical imaging and digital pathology in colorectal cancer research.

Qi-Qi ZHU
Qi-Qi ZHU
Surgeon
Ze-Ping HUANG
Ze-Ping HUANG
Medical Student
Cheng-Hang LI
Cheng-Hang LI
Research Assistant
Feng GAO
Feng GAO
Professor

My research leverages AI and big data to improve diagnostics, prognostics, and ultimately, outcomes in cancer and other biomedical fields.