Integrating diffusion components of multi-omics datasets with application to cancer molecular subtyping

Abstract

Cancer is a heterogeneous disease and consists of multiple molecular subtypes underlying the diverse clinical outcomes. Most strategies for cancer molecular subtyping are mainly based on unsupervised classification of single transcriptome data, especially gene expression profiles. However, molecular heterogeneity also exists on other genetic or epigenetic levels. For a more comprehensive analysis of cancer heterogeneity, multi-omics data integration provides a more effective solution. Here, we propose DMCI, which integrates the first diffusion component of multi-omics datasets into a joint variable, combining K-means to dissect the cancer heterogeneity. Diffusion map is a spectral non-linear dimension method where the first diffusion component accounting for the largest importance of dimension. The joint variable learning from our DMCI not only captures the complementary information from different data sources but also is more computational efficiency. To demonstrate the effectiveness, we applied DMCI for colorectal cancer and ovarian cancer subtyping, comparing with other data integration methods, our approach showed much better performance and identified molecular subtypes that are much more clinically relevant.

Publication
Intelligent Systems for Molecular Biology