Single-source Domain Generalized Object Detection (SDGOD), as a cutting-edge research topic in computer vision, aims to enhance model generalization capability in unseen target domains through single-source domain training. Current mainstream approaches attempt to mitigate domain discrepancies via data augmentation techniques. However, due to domain shift and limited domain-specific knowledge, models tend to fall into the pitfall of spurious correlations. This manifests as the model's over-reliance on simplistic classification features (e.g., color) rather than essential domain-invariant representations like object contours. To address this critical challenge, we propose the Cauvis (Causal Visual Prompts) method. First, we introduce a Cross-Attention Prompts module that mitigates bias from spurious features by integrating visual prompts with cross-attention. To address the inadequate domain knowledge coverage and spurious feature entanglement in visual prompts for single-domain generalization, we propose a dual-branch adapter that disentangles causal-spurious features while achieving domain adaptation via high-frequency feature extraction. Cauvis achieves state-of-the-art performance with 15.9-31.4% gains over existing domain generalization methods on SDGOD datasets, while exhibiting significant robustness advantages in complex interference environments.
翻译:单源域泛化目标检测(SDGOD)作为计算机视觉领域的前沿研究课题,旨在通过单源域训练提升模型在未见目标域中的泛化能力。当前主流方法试图通过数据增强技术缓解域间差异。然而,由于域偏移和领域特定知识的局限性,模型容易陷入伪相关性的陷阱,表现为过度依赖简单的分类特征(如颜色)而非物体轮廓等本质的域不变表征。为应对这一关键挑战,我们提出Cauvis(因果视觉提示)方法。首先,我们设计了跨注意力提示模块,通过将视觉提示与交叉注意力结合来缓解伪特征带来的偏差。针对单域泛化中视觉提示存在的领域知识覆盖不足与伪特征纠缠问题,我们提出双分支适配器,在通过高频特征提取实现域适应的同时解耦因果-伪特征。Cauvis在SDGOD数据集上较现有域泛化方法取得15.9-31.4%的性能提升,达到最先进水平,并在复杂干扰环境中展现出显著的鲁棒性优势。