Image fusion aims to generate a high-quality image from multiple images captured under varying conditions. The key problem of this task is to preserve complementary information while filtering out irrelevant information for the fused result. However, existing methods address this problem by leveraging static convolutional neural networks (CNNs), suffering two inherent limitations during feature extraction, i.e., being unable to handle spatial-variant contents and lacking guidance from multiple inputs. In this paper, we propose a novel mutual-guided dynamic network (MGDN) for image fusion, which allows for effective information utilization across different locations and inputs. Specifically, we design a mutual-guided dynamic filter (MGDF) for adaptive feature extraction, composed of a mutual-guided cross-attention (MGCA) module and a dynamic filter predictor, where the former incorporates additional guidance from different inputs and the latter generates spatial-variant kernels for different locations. In addition, we introduce a parallel feature fusion (PFF) module to effectively fuse local and global information of the extracted features. To further reduce the redundancy among the extracted features while simultaneously preserving their shared structural information, we devise a novel loss function that combines the minimization of normalized mutual information (NMI) with an estimated gradient mask. Experimental results on five benchmark datasets demonstrate that our proposed method outperforms existing methods on four image fusion tasks. The code and model are publicly available at: https://github.com/Guanys-dar/MGDN.
翻译:图像融合旨在从不同条件下拍摄的多幅图像中生成高质量图像。该任务的核心问题是在保留互补信息的同时滤除无关信息,以生成融合结果。然而,现有方法通过利用静态卷积神经网络(CNNs)解决该问题,在特征提取过程中存在两个固有局限性:无法处理空间变化内容且缺乏多输入引导。本文提出一种新颖的互导动态网络(MGDN)用于图像融合,该网络能够在不同位置和输入之间实现高效信息利用。具体而言,我们设计了一种互导动态滤波器(MGDF)用于自适应特征提取,该滤波器由互导交叉注意力(MGCA)模块和动态滤波器预测器组成,前者引入来自不同输入的额外引导信息,后者为不同位置生成空间变化核函数。此外,我们引入并行特征融合(PFF)模块以有效融合所提取特征的局部和全局信息。为在保留共享结构信息的同时进一步减少特征冗余,我们设计了一种结合归一化互信息(NMI)最小化与估计梯度掩膜的新型损失函数。在五个基准数据集上的实验结果表明,本文方法在四个图像融合任务中均优于现有方法。代码和模型已公开于:https://github.com/Guanys-dar/MGDN。