Uveitis demands the precise diagnosis of anterior chamber inflammation (ACI) for optimal treatment. However, current diagnostic methods only rely on a limited single-modal disease perspective, which leads to poor performance. In this paper, we investigate a promising yet challenging way to fuse multimodal data for ACI diagnosis. Notably, existing fusion paradigms focus on empowering implicit modality interactions (i.e., self-attention and its variants), but neglect to inject explicit modality interactions, especially from clinical knowledge and imaging property. To this end, we propose a jointly Explicit and implicit Cross-Modal Interaction Network (EiCI-Net) for Anterior Chamber Inflammation Diagnosis that uses anterior segment optical coherence tomography (AS-OCT) images, slit-lamp images, and clinical data jointly. Specifically, we first develop CNN-Based Encoders and Tabular Processing Module (TPM) to extract efficient feature representations in different modalities. Then, we devise an Explicit Cross-Modal Interaction Module (ECIM) to generate attention maps as a kind of explicit clinical knowledge based on the tabular feature maps, then integrated them into the slit-lamp feature maps, allowing the CNN-Based Encoder to focus on more effective informativeness of the slit-lamp images. After that, the Implicit Cross-Modal Interaction Module (ICIM), a transformer-based network, further implicitly enhances modality interactions. Finally, we construct a considerable real-world dataset from our collaborative hospital and conduct sufficient experiments to demonstrate the superior performance of our proposed EiCI-Net compared with the state-of-the-art classification methods in various metrics.
翻译:葡萄膜炎需要通过精确的前房炎症诊断来指导最佳治疗方案。然而,现有诊断方法仅依赖有限的单模态疾病视角,导致性能欠佳。本文探索了一种具有前景但充满挑战的多模态数据融合方法用于前房炎症诊断。值得注意的是,现有融合范式强调隐式模态交互(如自注意力机制及其变体),但忽视了注入显式模态交互,特别是来自临床知识和成像特征的交互。为此,我们提出了一种联合显式与隐式跨模态交互网络,用于基于眼前段光学相干断层扫描图像、裂隙灯图像及临床数据的前房炎症诊断。具体而言,我们首先构建基于卷积神经网络的编码器与表格数据处理模块,以提取不同模态下的高效特征表示;随后设计显式跨模态交互模块,基于表格特征图生成注意力图作为显式临床知识,并将其集成至裂隙灯特征图中,从而引导基于卷积神经网络的编码器聚焦于更具信息量的裂隙灯图像特征;接着,基于Transformer的隐式跨模态交互模块进一步隐式增强模态交互。最终,我们构建了来自合作医院的真实临床数据集,并通过充分的实验证明,提出的EiCI-Net在多项评价指标上均优于现有最先进的分类方法。