AdaFuse: Adaptive Medical Image Fusion Based on Spatial-Frequential Cross Attention

Multi-modal medical image fusion is essential for the precise clinical diagnosis and surgical navigation since it can merge the complementary information in multi-modalities into a single image. The quality of the fused image depends on the extracted single modality features as well as the fusion rules for multi-modal information. Existing deep learning-based fusion methods can fully exploit the semantic features of each modality, they cannot distinguish the effective low and high frequency information of each modality and fuse them adaptively. To address this issue, we propose AdaFuse, in which multimodal image information is fused adaptively through frequency-guided attention mechanism based on Fourier transform. Specifically, we propose the cross-attention fusion (CAF) block, which adaptively fuses features of two modalities in the spatial and frequency domains by exchanging key and query values, and then calculates the cross-attention scores between the spatial and frequency features to further guide the spatial-frequential information fusion. The CAF block enhances the high-frequency features of the different modalities so that the details in the fused images can be retained. Moreover, we design a novel loss function composed of structure loss and content loss to preserve both low and high frequency information. Extensive comparison experiments on several datasets demonstrate that the proposed method outperforms state-of-the-art methods in terms of both visual quality and quantitative metrics. The ablation experiments also validate the effectiveness of the proposed loss and fusion strategy. Our code is publicly available at https://github.com/xianming-gu/AdaFuse.

翻译：多模态医学图像融合对于精确临床诊断和手术导航至关重要，因为它能将多模态中的互补信息融合到单一图像中。融合图像的质量取决于提取的单模态特征以及多模态信息的融合规则。现有的基于深度学习的融合方法虽能充分挖掘各模态的语义特征，但无法区分各模态的有效低频与高频信息并进行自适应融合。针对这一问题，我们提出AdaFuse，通过基于傅里叶变换的频率引导注意力机制实现多模态图像信息的自适应融合。具体而言，我们提出交叉注意力融合（CAF）模块，该模块通过交换键值和查询值，在空间域和频率域中自适应融合两种模态的特征，随后计算空间特征与频率特征之间的交叉注意力分数，以进一步指导空间-频率信息融合。CAF模块增强了不同模态的高频特征，从而保留融合图像中的细节。此外，我们设计了一种由结构损失和内容损失组成的新型损失函数，以同时保留低频与高频信息。在多个数据集上的大量对比实验表明，所提方法在视觉质量和量化指标上均优于现有方法。消融实验也验证了所提损失函数与融合策略的有效性。我们的代码已在 https://github.com/xianming-gu/AdaFuse 公开。