Object detection is a critical task in computer vision, with applications in various domains such as autonomous driving and urban scene monitoring. However, deep learning-based approaches often demand large volumes of annotated data, which are costly and difficult to acquire, particularly in complex and unpredictable real-world environments. This dependency significantly hampers the generalization capability of existing object detection techniques. To address this issue, we introduce a novel single-domain object detection generalization method, named GoDiff, which leverages a pre-trained model to enhance generalization in unseen domains. Central to our approach is the Pseudo Target Data Generation (PTDG) module, which employs a latent diffusion model to generate pseudo-target domain data that preserves source domain characteristics while introducing stylistic variations. By integrating this pseudo data with source domain data, we diversify the training dataset. Furthermore, we introduce a cross-style instance normalization technique to blend style features from different domains generated by the PTDG module, thereby increasing the detector's robustness. Experimental results demonstrate that our method not only enhances the generalization ability of existing detectors but also functions as a plug-and-play enhancement for other single-domain generalization methods, achieving state-of-the-art performance in autonomous driving scenarios.
翻译:目标检测是计算机视觉中的一项关键任务,在自动驾驶和城市场景监控等多个领域具有重要应用。然而,基于深度学习的方法通常需要大量标注数据,这些数据获取成本高昂且难以收集,尤其是在复杂且不可预测的真实世界环境中。这种依赖性严重阻碍了现有目标检测技术的泛化能力。为解决这一问题,我们提出了一种新颖的单域目标检测泛化方法,命名为GoDiff,该方法利用预训练模型来增强在未见域上的泛化性能。我们方法的核心是伪目标数据生成模块,该模块采用潜在扩散模型来生成伪目标域数据,这些数据在保留源域特征的同时引入了风格变化。通过将此伪数据与源域数据相结合,我们实现了训练数据集的多样化。此外,我们引入了一种跨风格实例归一化技术,以融合PTDG模块生成的不同域的风格特征,从而提升检测器的鲁棒性。实验结果表明,我们的方法不仅增强了现有检测器的泛化能力,还可作为即插即用的增强模块用于其他单域泛化方法,在自动驾驶场景中取得了最先进的性能。