Existing multi-modal image fusion methods fail to address the compound degradations presented in source images, resulting in fusion images plagued by noise, color bias, improper exposure, \textit{etc}. Additionally, these methods often overlook the specificity of foreground objects, weakening the salience of the objects of interest within the fused images. To address these challenges, this study proposes a novel interactive multi-modal image fusion framework based on the text-modulated diffusion model, called Text-DiFuse. First, this framework integrates feature-level information integration into the diffusion process, allowing adaptive degradation removal and multi-modal information fusion. This is the first attempt to deeply and explicitly embed information fusion within the diffusion process, effectively addressing compound degradation in image fusion. Second, by embedding the combination of the text and zero-shot location model into the diffusion fusion process, a text-controlled fusion re-modulation strategy is developed. This enables user-customized text control to improve fusion performance and highlight foreground objects in the fused images. Extensive experiments on diverse public datasets show that our Text-DiFuse achieves state-of-the-art fusion performance across various scenarios with complex degradation. Moreover, the semantic segmentation experiment validates the significant enhancement in semantic performance achieved by our text-controlled fusion re-modulation strategy. The code is publicly available at https://github.com/Leiii-Cao/Text-DiFuse.
翻译:现有的多模态图像融合方法未能有效处理源图像中存在的复合退化问题,导致融合图像受到噪声、色彩偏差、曝光不当等问题的困扰。此外,这些方法往往忽略了前景物体的特异性,削弱了融合图像中感兴趣目标的显著性。为应对这些挑战,本研究提出了一种基于文本调制扩散模型的新型交互式多模态图像融合框架,称为 Text-DiFuse。首先,该框架将特征级信息整合融入扩散过程,实现了自适应的退化去除与多模态信息融合。这是首次尝试将信息融合深度且显式地嵌入扩散过程,有效解决了图像融合中的复合退化问题。其次,通过将文本与零样本定位模型的组合嵌入扩散融合过程,开发了一种文本控制的融合重调制策略。这使得用户可通过自定义文本控制来提升融合性能,并在融合图像中突出前景物体。在多种公共数据集上的大量实验表明,我们的 Text-DiFuse 在具有复杂退化的各种场景中均实现了最先进的融合性能。此外,语义分割实验验证了我们的文本控制融合重调制策略在语义性能上取得的显著提升。代码已公开于 https://github.com/Leiii-Cao/Text-DiFuse。