With the rapid advances in high-throughput sequencing technologies, the focus of survival analysis has shifted from examining clinical indicators to incorporating genomic profiles with pathological images. However, existing methods either directly adopt a straightforward fusion of pathological features and genomic profiles for survival prediction, or take genomic profiles as guidance to integrate the features of pathological images. The former would overlook intrinsic cross-modal correlations. The latter would discard pathological information irrelevant to gene expression. To address these issues, we present a Cross-Modal Translation and Alignment (CMTA) framework to explore the intrinsic cross-modal correlations and transfer potential complementary information. Specifically, we construct two parallel encoder-decoder structures for multi-modal data to integrate intra-modal information and generate cross-modal representation. Taking the generated cross-modal representation to enhance and recalibrate intra-modal representation can significantly improve its discrimination for comprehensive survival analysis. To explore the intrinsic crossmodal correlations, we further design a cross-modal attention module as the information bridge between different modalities to perform cross-modal interactions and transfer complementary information. Our extensive experiments on five public TCGA datasets demonstrate that our proposed framework outperforms the state-of-the-art methods.
翻译:随着高通量测序技术的快速发展,生存分析的重点已从检查临床指标转向整合基因组图谱与病理图像。然而,现有方法要么直接采用病理特征与基因组图谱的简单融合进行生存预测,要么以基因组图谱为指导整合病理图像特征。前者会忽略内在的跨模态相关性,后者则会丢弃与基因表达无关的病理信息。为解决这些问题,我们提出了一种跨模态翻译与对齐(CMTA)框架,以探索内在的跨模态相关性并传递潜在的互补信息。具体而言,我们为多模态数据构建了两个并行的编码器-解码器结构,用于整合模态内信息并生成跨模态表示。利用生成的跨模态表示来增强和重新校准模态内表示,可显著提升其在全面生存分析中的判别能力。为探索内在的跨模态相关性,我们进一步设计了一个跨模态注意力模块作为不同模态之间的信息桥梁,以实现跨模态交互并传递互补信息。在五个公开TCGA数据集上的大量实验表明,我们提出的框架优于现有最先进方法。