Integrating the different data modalities of cancer patients can significantly improve the predictive performance of patient survival. However, most existing methods ignore the simultaneous utilization of rich semantic features at different scales in pathology images. When collecting multimodal data and extracting features, there is a likelihood of encountering intra-modality missing data, introducing noise into the multimodal data. To address these challenges, this paper proposes a new end-to-end framework, FORESEE, for robustly predicting patient survival by mining multimodal information. Specifically, the cross-fusion transformer effectively utilizes features at the cellular level, tissue level, and tumor heterogeneity level to correlate prognosis through a cross-scale feature cross-fusion method. This enhances the ability of pathological image feature representation. Secondly, the hybrid attention encoder (HAE) uses the denoising contextual attention module to obtain the contextual relationship features and local detail features of the molecular data. HAE's channel attention module obtains global features of molecular data. Furthermore, to address the issue of missing information within modalities, we propose an asymmetrically masked triplet masked autoencoder to reconstruct lost information within modalities. Extensive experiments demonstrate the superiority of our method over state-of-the-art methods on four benchmark datasets in both complete and missing settings.
翻译:整合癌症患者的不同数据模态可显著提升生存预测性能,然而现有方法大多忽略了病理图像中不同尺度的丰富语义特征的同步利用。在收集多模态数据并进行特征提取时,常面临模态内部数据缺失问题,从而引入多模态数据噪声。为应对这些挑战,本文提出一种新的端到端框架FORESEE,通过挖掘多模态信息实现患者生存的鲁棒预测。具体而言,跨融合Transformer通过跨尺度特征交叉融合方法,有效利用细胞级、组织级及肿瘤异质性级特征进行预后关联,增强病理图像特征表示能力。其次,混合注意力编码器采用去噪上下文注意力模块获取分子数据的上下文关系特征与局部细节特征,其通道注意力模块则获取分子数据的全局特征。此外,针对模态内信息缺失问题,我们提出非对称掩蔽三元组自编码器重建模态内丢失信息。大量实验表明,在四个基准数据集上,我们的方法在完整与缺失两种场景下均优于现有最优方法。