Current image fusion methods struggle to address the composite degradations encountered in real-world imaging scenarios and lack the flexibility to accommodate user-specific requirements. In response to these challenges, we propose a controllable image fusion framework with language-vision prompts, termed ControlFusion, which adaptively neutralizes composite degradations. On the one hand, we develop a degraded imaging model that integrates physical imaging mechanisms, including the Retinex theory and atmospheric scattering principle, to simulate composite degradations, thereby providing potential for addressing real-world complex degradations from the data level. On the other hand, we devise a prompt-modulated restoration and fusion network that dynamically enhances features with degradation prompts, enabling our method to accommodate composite degradation of varying levels. Specifically, considering individual variations in quality perception of users, we incorporate a text encoder to embed user-specified degradation types and severity levels as degradation prompts. We also design a spatial-frequency collaborative visual adapter that autonomously perceives degradations in source images, thus eliminating the complete dependence on user instructions. Extensive experiments demonstrate that ControlFusion outperforms SOTA fusion methods in fusion quality and degradation handling, particularly in countering real-world and compound degradations with various levels. The source code is publicly available at https://github.com/Linfeng-Tang/ControlFusion.
翻译:现有图像融合方法难以应对真实成像场景中的复合退化问题,且缺乏适应用户特定需求的灵活性。针对这些挑战,本文提出一种基于语言-视觉提示的可控图像融合框架ControlFusion,能够自适应地中和复合退化效应。一方面,我们构建了融合Retinex理论与大气散射原理等物理成像机制的退化成像模型,以模拟复合退化现象,从而在数据层面为处理真实世界复杂退化提供可能。另一方面,我们设计了提示调制的复原融合网络,通过退化提示动态增强特征表示,使方法能够适应不同强度的复合退化。具体而言,考虑到用户对质量感知的个体差异,我们引入文本编码器将用户指定的退化类型与严重程度嵌入为退化提示。同时设计了空间-频率协同视觉适配器,可自主感知源图像中的退化信息,从而避免完全依赖用户指令。大量实验表明,ControlFusion在融合质量与退化处理方面均优于当前最先进的融合方法,尤其在应对不同强度的真实世界复合退化场景中表现突出。源代码已公开于https://github.com/Linfeng-Tang/ControlFusion。