Multi-modal image fusion (MMIF) integrates valuable information from different modality images into a fused one. However, the fusion of multiple visible images with different focal regions and infrared images is a unprecedented challenge in real MMIF applications. This is because of the limited depth of the focus of visible optical lenses, which impedes the simultaneous capture of the focal information within the same scene. To address this issue, in this paper, we propose a MMIF framework for joint focused integration and modalities information extraction. Specifically, a semi-sparsity-based smoothing filter is introduced to decompose the images into structure and texture components. Subsequently, a novel multi-scale operator is proposed to fuse the texture components, capable of detecting significant information by considering the pixel focus attributes and relevant data from various modal images. Additionally, to achieve an effective capture of scene luminance and reasonable contrast maintenance, we consider the distribution of energy information in the structural components in terms of multi-directional frequency variance and information entropy. Extensive experiments on existing MMIF datasets, as well as the object detection and depth estimation tasks, consistently demonstrate that the proposed algorithm can surpass the state-of-the-art methods in visual perception and quantitative evaluation. The code is available at https://github.com/ixilai/MFIF-MMIF.
翻译:多模态图像融合(MMIF)旨在将来自不同模态图像的有价值信息整合为一幅融合图像。然而,在真实MMIF应用中,融合具有不同聚焦区域的多个可见光图像与红外图像是一项前所未有的挑战。这是由于可见光光学镜头的有限景深阻碍了同一场景中聚焦信息的同步捕获。为解决该问题,本文提出一种面向联合聚焦集成与模态信息提取的MMIF框架。具体地,引入基于半稀疏性的平滑滤波器将图像分解为结构分量与纹理分量。随后,提出一种新颖的多尺度算子用于融合纹理分量,该算子通过考虑像素聚焦属性及来自不同模态图像的相关数据,能够有效检测显著信息。此外,为有效捕获场景亮度并保持合理的对比度,我们基于多方向频率方差与信息熵来考虑结构分量中能量信息的分布。在现有MMIF数据集以及目标检测与深度估计任务上的大量实验一致表明,所提算法在视觉感知与定量评估上均能超越当前最优方法。代码已开源至 https://github.com/ixilai/MFIF-MMIF。