Semantic analysis on visible (RGB) and infrared (IR) images has gained attention for its ability to be more accurate and robust under low-illumination and complex weather conditions. Due to the lack of pre-trained foundation models on the large-scale infrared image datasets, existing methods prefer to design task-specific frameworks and directly fine-tune them with pre-trained foundation models on their RGB-IR semantic relevance datasets, which results in poor scalability and limited generalization. In this work, we propose a scalable and efficient framework called UniRGB-IR to unify RGB-IR downstream tasks, in which a novel adapter is developed to efficiently introduce richer RGB-IR features into the pre-trained RGB-based foundation model. Specifically, our framework consists of a vision transformer (ViT) foundation model, a Multi-modal Feature Pool (MFP) module and a Supplementary Feature Injector (SFI) module. The MFP and SFI modules cooperate with each other as an adpater to effectively complement the ViT features with the contextual multi-scale features. During training process, we freeze the entire foundation model to inherit prior knowledge and only optimize the MFP and SFI modules. Furthermore, to verify the effectiveness of our framework, we utilize the ViT-Base as the pre-trained foundation model to perform extensive experiments. Experimental results on various RGB-IR downstream tasks demonstrate that our method can achieve state-of-the-art performance. The source code and results are available at https://github.com/PoTsui99/UniRGB-IR.git.
翻译:可见光(RGB)与红外(IR)图像的语义分析因其在低光照和复杂天气条件下更高的准确性与鲁棒性而受到广泛关注。由于缺乏在大规模红外图像数据集上预训练的基础模型,现有方法倾向于设计特定任务框架,并直接利用RGB-IR语义相关性数据集对预训练基础模型进行微调,导致可扩展性差且泛化能力有限。本文提出一种可扩展的高效框架UniRGB-IR,用于统一RGB-IR下游任务。该框架开发了一种新型适配器,能够高效地将更丰富的RGB-IR特征引入基于RGB的预训练基础模型。具体而言,我们的框架包含一个视觉Transformer(ViT)基础模型、一个多模态特征池(MFP)模块和一个补充特征注入器(SFI)模块。MFP与SFI模块相互协作,作为适配器有效补充ViT特征的上下文多尺度信息。训练过程中,我们冻结整个基础模型以继承先验知识,仅优化MFP和SFI模块。此外,为验证框架有效性,我们采用ViT-Base作为预训练基础模型进行大量实验。在多种RGB-IR下游任务上的实验结果表明,我们的方法能够达到最先进性能。源代码与结果见https://github.com/PoTsui99/UniRGB-IR.git。