A Multi-scale Information Integration Framework for Infrared and Visible Image Fusion

Infrared and visible image fusion aims at generating a fused image containing the intensity and detail information of source images, and the key issue is effectively measuring and integrating the complementary information of multi-modality images from the same scene. Existing methods mostly adopt a simple weight in the loss function to decide the information retention of each modality rather than adaptively measuring complementary information for different image pairs. In this study, we propose a multi-scale dual attention (MDA) framework for infrared and visible image fusion, which is designed to measure and integrate complementary information in both structure and loss function at the image and patch level. In our method, the residual downsample block decomposes source images into three scales first. Then, dual attention fusion block integrates complementary information and generates a spatial and channel attention map at each scale for feature fusion. Finally, the output image is reconstructed by the residual reconstruction block. Loss function consists of image-level, feature-level and patch-level three parts, of which the calculation of the image-level and patch-level two parts are based on the weights generated by the complementary information measurement. Indeed, to constrain the pixel intensity distribution between the output and infrared image, a style loss is added. Our fusion results perform robust and informative across different scenarios. Qualitative and quantitative results on two datasets illustrate that our method is able to preserve both thermal radiation and detailed information from two modalities and achieve comparable results compared with the other state-of-the-art methods. Ablation experiments show the effectiveness of our information integration architecture and adaptively measure complementary information retention in the loss function.

翻译：红外与可见光图像融合旨在生成包含源图像强度与细节信息的融合图像，其核心在于有效度量并整合同一场景下多模态图像的互补信息。现有方法多采用损失函数中的简单权重来决定各模态的信息保留程度，而非针对不同图像对自适应地度量互补信息。本研究提出一种用于红外与可见光图像融合的多尺度双重注意力（MDA）框架，该框架旨在图像与图像块层面，从结构及损失函数两方面度量并整合互补信息。在我们的方法中，残差下采样模块首先将源图像分解为三个尺度。随后，双重注意力融合模块整合互补信息，并在每个尺度上生成空间与通道注意力图用于特征融合。最后，输出图像由残差重建模块重构。损失函数由图像级、特征级和图像块级三部分组成，其中图像级与图像块级两部分基于互补信息度量生成的权重进行计算。此外，为约束输出图像与红外图像间的像素强度分布，额外添加了风格损失。我们的融合结果在不同场景下均展现出稳健性与信息丰富性。在两个数据集上的定性与定量结果表明，该方法能够同时保留两种模态的热辐射与细节信息，并与当前最先进方法相比取得可比较的结果。消融实验验证了信息整合架构以及损失函数中自适应度量互补信息保留的有效性。