Detectors often suffer from degraded performance, primarily due to the distributional gap between the source and target domains. This issue is especially evident in single-source domains with limited data, as models tend to rely on confounders (e.g., illumination, co-occurrence, and style) from the source domain, leading to spurious correlations that hinder generalization. To this end, this paper proposes a novel Basis-driven framework for domain generalization, namely \textbf{\textit{Bridge}}, that incorporates causal inference into object detection. By learning the low-rank bases for front-door adjustment, \textbf{\textit{Bridge}} blocks confounders' effects to mitigate spurious correlations, while simultaneously refining representations by filtering redundant and task-irrelevant components. \textbf{\textit{Bridge}} can be seamlessly integrated with both discriminative (e.g., DINOv2/3, SAM) and generative (e.g., Stable Diffusion) Vision Foundation Models (VFMs). Extensive experiments across multiple domain generalization object detection datasets, i.e., Cross-Camera, Adverse Weather, Real-to-Artistic, Diverse Weather Datasets, and Diverse Weather DroneVehicle (our newly augmented real-world UAV-based benchmark), underscore the superiority of our proposed method over previous state-of-the-art approaches. The project page is available at: https://mingbohong.github.io/Bridge/.
翻译:目标检测器常因源域与目标域之间的分布差异而导致性能下降,这一问题在数据有限的单源域场景中尤为突出——模型倾向于依赖源域的混杂因素(如光照、共现特征和风格)产生虚假关联,从而阻碍泛化能力。为此,本文提出了一种新颖的基于基的领域泛化框架——**桥**(Bridge),该框架将因果推断融入目标检测。通过为前门调整学习低秩基,**桥**阻断了混杂因素的影响以缓解虚假关联,同时通过过滤冗余和任务无关成分来精炼特征表示。**桥**可无缝集成判别式视觉基础模型(如DINOv2/3、SAM)与生成式视觉基础模型(如Stable Diffusion)。跨多个领域泛化目标检测数据集(包括跨摄像头、恶劣天气、真实到艺术风格、多样化天气数据集,以及我们新增强的基于真实世界无人机的基准——多样化天气无人机车辆数据集)的大量实验表明,所提方法优于现有最先进方法。项目页面详见:https://mingbohong.github.io/Bridge/。