Multimodal Outer Arithmetic Block Dual Fusion of Whole Slide Images and Omics Data for Precision Oncology

Developing a central nervous system (CNS) tumor classifier by integrating DNA methylation data with Whole Slide Images (WSI) offers significant potential for enhancing diagnostic precision in neuropathology. Existing approaches typically integrate encoded omic data with histology only once - either at an early or late fusion stage - while reintroducing encoded omic data to create a dual fusion variant remains unexplored. Nevertheless, reintroduction of omic embeddings during early and late fusion enables the capture of complementary information from localized patch-level and holistic slide-level interactions, allowing boosted performance through advanced multimodal integration. To achieve this, we propose a dual fusion framework that integrates omic data at both early and late stages, fully leveraging its diagnostic strength. In the early fusion stage, omic embeddings are projected into a patch-wise latent space, generating omic-WSI embeddings that encapsulate per-patch molecular and morphological insights, effectively incorporating this information into the spatial representation of histology. These embeddings are refined with a multiple instance learning gated attention mechanism to attend to critical patches. In the late fusion stage, we reintroduce the omic data by fusing it with slide-level omic-WSI embeddings using a Multimodal Outer Arithmetic Block (MOAB), which richly intermingles features from both modalities, capturing their global correlations and complementarity. We demonstrate accurate CNS tumor subtyping across 20 fine-grained subtypes and validate our approach on benchmark datasets, achieving improved survival prediction on TCGA-BLCA and competitive performance on TCGA-BRCA compared to state-of-the-art methods. This dual fusion strategy enhances interpretability and classification performance, highlighting its potential for clinical diagnostics.

翻译：通过整合DNA甲基化数据与全切片图像（WSI）开发中枢神经系统（CNS）肿瘤分类器，为提升神经病理学诊断精度提供了重要潜力。现有方法通常仅在早期或晚期融合阶段将编码后的组学数据与组织学特征整合一次，而通过重新引入编码组学数据构建双重融合变体的方法尚未被探索。然而，在早期与晚期融合阶段重新引入组学嵌入能够从局部图像块级别与整体切片级别的交互中捕获互补信息，从而通过先进的多模态整合实现性能提升。为此，我们提出了一种在早期与晚期阶段双重融合组学数据的框架，以充分发挥其诊断效能。在早期融合阶段，组学嵌入被投影至图像块级别的潜在空间，生成融合组学与WSI的嵌入表示，该表示封装了每个图像块的分子与形态学信息，从而有效将此类信息纳入组织学的空间表征中。这些嵌入通过多示例学习门控注意力机制进行优化，以聚焦于关键图像块。在晚期融合阶段，我们通过多模态外部算术块（MOAB）将组学数据与切片级别的组学-WSI嵌入进行融合，该模块充分混合了来自两种模态的特征，捕获其全局关联性与互补性。我们在20种细粒度亚型上实现了精确的CNS肿瘤分型，并在基准数据集上验证了该方法，相较于现有先进方法，在TCGA-BLCA数据集上获得了提升的生存预测性能，在TCGA-BRCA数据集上表现出竞争力。这一双重融合策略增强了可解释性与分类性能，凸显了其在临床诊断中的应用潜力。