Image fusion plays a key role in a variety of multi-sensor-based vision systems, especially for enhancing visual quality and/or extracting aggregated features for perception. However, most existing methods just consider image fusion as an individual task, thus ignoring its underlying relationship with these downstream vision problems. Furthermore, designing proper fusion architectures often requires huge engineering labor. It also lacks mechanisms to improve the flexibility and generalization ability of current fusion approaches. To mitigate these issues, we establish a Task-guided, Implicit-searched and Meta-initialized (TIM) deep model to address the image fusion problem in a challenging real-world scenario. Specifically, we first propose a constrained strategy to incorporate information from downstream tasks to guide the unsupervised learning process of image fusion. Within this framework, we then design an implicit search scheme to automatically discover compact architectures for our fusion model with high efficiency. In addition, a pretext meta initialization technique is introduced to leverage divergence fusion data to support fast adaptation for different kinds of image fusion tasks. Qualitative and quantitative experimental results on different categories of image fusion problems and related downstream tasks (e.g., visual enhancement and semantic understanding) substantiate the flexibility and effectiveness of our TIM. The source code will be available at https://github.com/LiuZhu-CV/TIMFusion.
翻译:图像融合在多传感器视觉系统中具有关键作用,尤其有助于提升视觉质量或提取感知聚合特征。然而,现有方法多将图像融合视为独立任务,忽略了其与下游视觉问题的内在关联。此外,设计合理的融合架构需要大量工程投入,且缺乏提升融合方法灵活性与泛化能力的机制。为解决上述问题,我们建立了一种任务引导、隐式搜索与元初始化的深度模型(TIM),以应对复杂真实场景下的图像融合挑战。具体而言,首先提出一种约束策略,通过融入下游任务信息引导图像融合的无监督学习过程;在此基础上设计隐式搜索算法,高效自动发现紧凑融合架构;同时引入预文本元初始化技术,利用异质融合数据支持不同图像融合任务的快速适应。在多种图像融合问题及下游任务(如视觉增强与语义理解)上的定性与定量实验结果,验证了TIM的灵活性与有效性。源代码将发布于https://github.com/LiuZhu-CV/TIMFusion。