Single-Domain Generalized Object Detection~(S-DGOD) aims to train on a single source domain for robust performance across a variety of unseen target domains by taking advantage of an object detector. Existing S-DGOD approaches often rely on data augmentation strategies, including a composition of visual transformations, to enhance the detector's generalization ability. However, the absence of real-world prior knowledge hinders data augmentation from contributing to the diversity of training data distributions. To address this issue, we propose PhysAug, a novel physical model-based non-ideal imaging condition data augmentation method, to enhance the adaptability of the S-DGOD tasks. Drawing upon the principles of atmospheric optics, we develop a universal perturbation model that serves as the foundation for our proposed PhysAug. Given that visual perturbations typically arise from the interaction of light with atmospheric particles, the image frequency spectrum is harnessed to simulate real-world variations during training. This approach fosters the detector to learn domain-invariant representations, thereby enhancing its ability to generalize across various settings. Without altering the network architecture or loss function, our approach significantly outperforms the state-of-the-art across various S-DGOD datasets. In particular, it achieves a substantial improvement of $7.3\%$ and $7.2\%$ over the baseline on DWD and Cityscape-C, highlighting its enhanced generalizability in real-world settings.
翻译:单域泛化目标检测(S-DGOD)旨在利用目标检测器,通过在单一源域上进行训练,实现在多种未见目标域上的鲁棒性能。现有的S-DGOD方法通常依赖数据增强策略(包括视觉变换的组合)来提升检测器的泛化能力。然而,由于缺乏真实世界的先验知识,数据增强难以有效提升训练数据分布的多样性。为解决这一问题,我们提出PhysAug,一种新颖的基于物理模型的非理想成像条件数据增强方法,以增强S-DGOD任务的适应能力。基于大气光学原理,我们构建了一个通用扰动模型,作为所提PhysAug的基础。鉴于视觉扰动通常源于光与大气粒子的相互作用,我们在训练过程中利用图像频谱来模拟真实世界的变化。该方法促使检测器学习域不变表示,从而提升其在不同场景下的泛化能力。在不改变网络架构或损失函数的情况下,我们的方法在多种S-DGOD数据集上显著优于现有最优方法。特别是在DWD和Cityscape-C数据集上,相较于基线方法分别实现了$7.3\%$和$7.2\%$的显著提升,突显了其在真实场景中增强的泛化能力。