The integration of multi-omics data has emerged as a promising approach for gaining comprehensive insights into complex diseases such as cancer. This paper proposes a novel approach to identify cancer subtypes through the integration of multi-omics data for clustering. The proposed method, named LIDAF utilises affinity matrices based on linear relationships between and within different omics datasets (Linear Inter and Intra Dataset Affinity Fusion (LIDAF)). Canonical Correlation Analysis is in this paper employed to create distance matrices based on Euclidean distances between canonical variates. The distance matrices are converted to affinity matrices and those are fused in a three-step process. The proposed LIDAF addresses the limitations of the existing method resulting in improvement of clustering performance as measured by the Adjusted Rand Index and the Normalized Mutual Information score. Moreover, our proposed LIDAF approach demonstrates a notable enhancement in 50% of the log10 rank p-values obtained from Cox survival analysis, surpassing the performance of the best reported method, highlighting its potential of identifying distinct cancer subtypes.
翻译:多组学数据的整合已成为深入理解癌症等复杂疾病的一种有前景的方法。本文提出了一种通过整合多组学数据进行聚类以识别癌症亚型的新方法。该方法名为LIDAF,利用基于不同组学数据集之间及内部线性关系的亲和矩阵(线性数据集间与数据集内亲和融合,LIDAF)。本文采用典型相关分析,基于典型变量之间的欧氏距离构建距离矩阵。该距离矩阵被转换为亲和矩阵,并通过三步过程进行融合。所提出的LIDAF方法克服了现有方法的局限性,从而在调整兰德指数和标准化互信息评分方面提升了聚类性能。此外,我们的LIDAF方法在Cox生存分析中,50%的log10秩p值表现出了显著改善,超越了已有最佳方法,突显了其在识别不同癌症亚型方面的潜力。