The fusion of visible light and infrared images has garnered significant attention in the field of imaging due to its pivotal role in various applications, including surveillance, remote sensing, and medical imaging. Therefore, this paper introduces a novel fusion framework using Res2Net architecture, capturing features across diverse receptive fields and scales for effective extraction of global and local features. Our methodology is structured into three fundamental components: the first part involves the Res2Net-based encoder, followed by the second part, which encompasses the fusion layer, and finally, the third part, which comprises the decoder. The encoder based on Res2Net is utilized for extracting multi-scale features from the input image. Simultaneously, with a single image as input, we introduce a pioneering training strategy tailored for a Res2Net-based encoder. We further enhance the fusion process with a novel strategy based on the attention model, ensuring precise reconstruction by the decoder for the fused image. Experimental results unequivocally showcase our method's unparalleled fusion performance, surpassing existing techniques, as evidenced by rigorous subjective and objective evaluations.
翻译:可见光与红外图像的融合在成像领域备受关注,因其在监控、遥感和医学成像等多种应用中具有关键作用。为此,本文提出了一种基于Res2Net架构的新型融合框架,该框架能够捕获不同感受野和尺度下的特征,从而有效提取全局与局部特征。我们的方法分为三个基本组成部分:第一部分是基于Res2Net的编码器,第二部分是融合层,第三部分为解码器。基于Res2Net的编码器用于从输入图像中提取多尺度特征。同时,我们针对基于Res2Net的编码器,提出了一种以单幅图像为输入的创新训练策略。此外,我们通过一种基于注意力模型的新策略进一步优化了融合过程,确保解码器能够精确重建融合图像。实验结果明确表明,我们的方法在融合性能上表现卓越,超越了现有技术,这一点通过严格的主观与客观评估得到了验证。