Multi-modal fusion is crucial in medical data research, enabling a comprehensive understanding of diseases and improving diagnostic performance by combining diverse modalities. However, multi-modal fusion faces challenges, including capturing interactions between modalities, addressing missing modalities, handling erroneous modal information, and ensuring interpretability. Many existing researchers tend to design different solutions for these problems, often overlooking the commonalities among them. This paper proposes a novel multi-modal fusion framework that achieves adaptive adjustment over the weights of each modality by introducing the Modal-Domain Attention (MDA). It aims to facilitate the fusion of multi-modal information while allowing for the inclusion of missing modalities or intrinsic noise, thereby enhancing the representation of multi-modal data. We provide visualizations of accuracy changes and MDA weights by observing the process of modal fusion, offering a comprehensive analysis of its interpretability. Extensive experiments on various gastrointestinal disease benchmarks, the proposed MDA maintains high accuracy even in the presence of missing modalities and intrinsic noise. One thing worth mentioning is that the visualization of MDA is highly consistent with the conclusions of existing clinical studies on the dependence of different diseases on various modalities. Code and dataset will be made available.
翻译:多模态融合在医学数据研究中至关重要,它通过整合不同模态的信息,实现对疾病的全面理解并提升诊断性能。然而,多模态融合面临诸多挑战,包括捕获模态间交互、处理模态缺失、应对错误模态信息以及确保可解释性。现有研究往往针对这些问题设计不同的解决方案,而忽视了它们之间的共性。本文提出一种新颖的多模态融合框架,通过引入模态域注意力机制,实现对各模态权重的自适应调整。该框架旨在促进多模态信息的融合,同时允许模态缺失或固有噪声的存在,从而增强多模态数据的表征能力。我们通过观察模态融合过程,提供了准确率变化与MDA权重的可视化结果,对其可解释性进行了全面分析。在多种胃肠道疾病基准数据集上的大量实验表明,即使存在模态缺失和固有噪声,所提出的MDA方法仍能保持较高的准确率。值得一提的是,MDA的可视化结果与现有临床研究中关于不同疾病对各模态依赖性的结论高度一致。代码与数据集将公开提供。