Single-domain generalization aims to learn a model from single source domain data to achieve generalized performance on other unseen target domains. Existing works primarily focus on improving the generalization ability of static networks. However, static networks are unable to dynamically adapt to the diverse variations in different image scenes, leading to limited generalization capability. Different scenes exhibit varying levels of complexity, and the complexity of images further varies significantly in cross-domain scenarios. In this paper, we propose a dynamic object-centric perception network based on prompt learning, aiming to adapt to the variations in image complexity. Specifically, we propose an object-centric gating module based on prompt learning to focus attention on the object-centric features guided by the various scene prompts. Then, with the object-centric gating masks, the dynamic selective module dynamically selects highly correlated feature regions in both spatial and channel dimensions enabling the model to adaptively perceive object-centric relevant features, thereby enhancing the generalization capability. Extensive experiments were conducted on single-domain generalization tasks in image classification and object detection. The experimental results demonstrate that our approach outperforms state-of-the-art methods, which validates the effectiveness and generally of our proposed method.
翻译:单域泛化的目标是从单一源域数据中学习模型,以在未见过的其他目标域上实现泛化性能。现有工作主要聚焦于提升静态网络的泛化能力,但静态网络无法动态适应不同图像场景中的多样变化,导致泛化能力受限。不同场景呈现出不同的复杂度,且跨域场景中图像的复杂度差异更为显著。本文提出一种基于提示学习的动态以对象为中心的感知网络,旨在适应图像复杂度的变化。具体而言,我们设计了一种基于提示学习的以对象为中心的门控模块,通过不同场景提示引导注意力聚焦于对象核心特征。随后,利用该对象核心门控掩码,动态选择模块在空间与通道维度中动态选取高相关特征区域,使模型能够自适应地感知对象核心相关特征,从而增强泛化能力。我们在图像分类与目标检测的单域泛化任务上进行了大量实验,结果表明我们的方法超越了现有最先进技术,验证了所提方法的有效性与通用性。