Infrared and visible image fusion can compensate for the incompleteness of single-modality imaging and provide a more comprehensive scene description based on cross-modal complementarity. Most works focus on learning the overall cross-modal features by high- and low-frequency constraints at the image level alone, ignoring the fact that cross-modal instance-level features often contain more valuable information. To fill this gap, we model cross-modal instance-level features by embedding instance information into a set of Mixture-of-Experts (MoEs) for the first time, prompting image fusion networks to specifically learn instance-level information. We propose a novel framework with instance embedded Mixture-of-Experts for infrared and visible image fusion, termed MoE-Fusion, which contains an instance embedded MoE group (IE-MoE), an MoE-Decoder, two encoders, and two auxiliary detection networks. By embedding the instance-level information learned in the auxiliary network, IE-MoE achieves specialized learning of cross-modal foreground and background features. MoE-Decoder can adaptively select suitable experts for cross-modal feature decoding and obtain fusion results dynamically. Extensive experiments show that our MoE-Fusion outperforms state-of-the-art methods in preserving contrast and texture details by learning instance-level information in cross-modal images.
翻译:红外与可见光图像融合可弥补单模态成像的不完整性,基于跨模态互补性提供更全面的场景描述。现有方法大多仅通过图像层面的高低频约束学习整体跨模态特征,忽视了跨模态实例级特征往往包含更具价值的信息。为填补这一空白,我们首次通过将实例信息嵌入混合专家模型(MoE)来建模跨模态实例级特征,促使图像融合网络专门学习实例级信息。我们提出一种基于实例嵌入混合专家模型的红外与可见光图像融合新框架MoE-Fusion,该框架包含实例嵌入MoE组(IE-MoE)、MoE解码器、两个编码器以及两个辅助检测网络。通过嵌入辅助网络中学习的实例级信息,IE-MoE实现了对跨模态前景与背景特征的专门化学习。MoE解码器可自适应选择合适专家进行跨模态特征解码并动态获得融合结果。大量实验表明,我们的MoE-Fusion通过学习跨模态图像中的实例级信息,在保持对比度与纹理细节方面优于现有最优方法。