Multi-Modal Image Fusion (MMIF) aims to combine images from different modalities to produce fused images, retaining texture details and preserving significant information. Recently, some MMIF methods incorporate frequency domain information to enhance spatial features. However, these methods typically rely on simple serial or parallel spatial-frequency fusion without interaction. In this paper, we propose a novel Interactive Spatial-Frequency Fusion Mamba (ISFM) framework for MMIF. Specifically, we begin with a Modality-Specific Extractor (MSE) to extract features from different modalities. It models long-range dependencies across the image with linear computational complexity. To effectively leverage frequency information, we then propose a Multi-scale Frequency Fusion (MFF). It adaptively integrates low-frequency and high-frequency components across multiple scales, enabling robust representations of frequency features. More importantly, we further propose an Interactive Spatial-Frequency Fusion (ISF). It incorporates frequency features to guide spatial features across modalities, enhancing complementary representations. Extensive experiments are conducted on six MMIF datasets. The experimental results demonstrate that our ISFM can achieve better performances than other state-of-the-art methods. The source code is available at https://github.com/Namn23/ISFM.
翻译:多模态图像融合旨在整合不同模态的图像以生成融合图像,保留纹理细节并维持重要信息。近年来,部分多模态图像融合方法引入频域信息以增强空间特征。然而,这些方法通常依赖于简单的串行或并行空频融合,缺乏交互机制。本文提出一种新颖的交互式空频融合Mamba框架用于多模态图像融合。具体而言,我们首先采用模态特定提取器从不同模态中提取特征,该模块以线性计算复杂度建模图像中的长程依赖关系。为有效利用频率信息,我们进一步提出多尺度频率融合模块,该模块自适应地整合多尺度下的低频与高频分量,从而构建鲁棒的频率特征表示。更重要的是,我们进一步提出交互式空频融合模块,通过引入频率特征来引导跨模态的空间特征,增强互补表示能力。我们在六个多模态图像融合数据集上进行了广泛实验,结果表明所提出的交互式空频融合Mamba框架能够取得优于其他先进方法的性能。源代码公开于https://github.com/Namn23/ISFM。