Multimodal image fusion (MMIF) aims to integrate information from different modalities to obtain a comprehensive image, aiding downstream tasks. However, existing methods tend to prioritize natural image fusion and focus on information complementary and network training strategies. They ignore the essential distinction between natural and medical image fusion and the influence of underlying components. This paper dissects the significant differences between the two tasks regarding fusion goals, statistical properties, and data distribution. Based on this, we rethink the suitability of the normalization strategy and convolutional kernels for end-to-end MMIF.Specifically, this paper proposes a mixture of instance normalization and group normalization to preserve sample independence and reinforce intrinsic feature correlation.This strategy promotes the potential of enriching feature maps, thus boosting fusion performance. To this end, we further introduce the large kernel convolution, effectively expanding receptive fields and enhancing the preservation of image detail. Moreover, the proposed multipath adaptive fusion module recalibrates the decoder input with features of various scales and receptive fields, ensuring the transmission of crucial information. Extensive experiments demonstrate that our method exhibits state-of-the-art performance in multiple fusion tasks and significantly improves downstream applications. The code is available at https://github.com/HeDan-11/LKC-FUNet.
翻译:多模态图像融合(MMIF)旨在整合不同模态的信息以获取综合性图像,从而辅助下游任务。然而,现有方法往往优先考虑自然图像融合,并侧重于信息互补性与网络训练策略,却忽视了自然图像与医学图像融合之间的本质区别及其底层组件的影响。本文剖析了这两类任务在融合目标、统计特性和数据分布方面的显著差异。基于此,我们重新审视了归一化策略与卷积核对端到端MMIF的适用性。具体而言,本文提出结合实例归一化与组归一化的方法,以保持样本独立性并增强内在特征相关性。该策略有助于丰富特征图的潜力,从而提升融合性能。为此,我们进一步引入大核卷积,有效扩大感受野并增强图像细节的保留。此外,所提出的多路径自适应融合模块通过多尺度和多感受野的特征重新校准解码器输入,确保关键信息的传递。大量实验表明,我们的方法在多种融合任务中展现出最先进的性能,并显著改善了下游应用。代码发布于https://github.com/HeDan-11/LKC-FUNet。