Multi-modal image fusion (MMIF) integrates valuable information from different modality images into a fused one. However, the fusion of multiple visible images with different focal regions and infrared images is a unprecedented challenge in real MMIF applications. This is because of the limited depth of the focus of visible optical lenses, which impedes the simultaneous capture of the focal information within the same scene. To address this issue, in this paper, we propose a MMIF framework for joint focused integration and modalities information extraction. Specifically, a semi-sparsity-based smoothing filter is introduced to decompose the images into structure and texture components. Subsequently, a novel multi-scale operator is proposed to fuse the texture components, capable of detecting significant information by considering the pixel focus attributes and relevant data from various modal images. Additionally, to achieve an effective capture of scene luminance and reasonable contrast maintenance, we consider the distribution of energy information in the structural components in terms of multi-directional frequency variance and information entropy. Extensive experiments on existing MMIF datasets, as well as the object detection and depth estimation tasks, consistently demonstrate that the proposed algorithm can surpass the state-of-the-art methods in visual perception and quantitative evaluation. The code is available at https://github.com/ixilai/MFIF-MMIF.
翻译:多模态图像融合(MMIF)将不同模态图像中的有价值信息整合为融合图像。然而,在真实MMIF应用中,融合具有不同聚焦区域的多幅可见光图像与红外图像是一项前所未有的挑战。这是由于可见光光学镜头景深有限,难以在同一场景中同时捕获聚焦信息。为解决此问题,本文提出一种联合聚焦整合与模态信息提取的MMIF框架。具体而言,引入基于半稀疏性的平滑滤波器将图像分解为结构分量与纹理分量;进而提出新颖的多尺度算子用于融合纹理分量,该算子通过考虑像素聚焦属性及多模态图像的相关数据,能够检测显著信息。此外,为实现场景亮度的有效捕获与对比度的合理维持,我们从多方向频率方差与信息熵角度,探讨结构分量中能量信息的分布规律。在现有MMIF数据集以及目标检测与深度估计任务上的大量实验一致表明,所提算法在视觉感知与定量评价方面均能超越现有最优方法。代码已开源至https://github.com/ixilai/MFIF-MMIF。