In this work, we tackle the problem of domain generalization for object detection, specifically focusing on the scenario where only a single source domain is available. We propose an effective approach that involves two key steps: diversifying the source domain and aligning detections based on class prediction confidence and localization. Firstly, we demonstrate that by carefully selecting a set of augmentations, a base detector can outperform existing methods for single domain generalization by a good margin. This highlights the importance of domain diversification in improving the performance of object detectors. Secondly, we introduce a method to align detections from multiple views, considering both classification and localization outputs. This alignment procedure leads to better generalized and well-calibrated object detector models, which are crucial for accurate decision-making in safety-critical applications. Our approach is detector-agnostic and can be seamlessly applied to both single-stage and two-stage detectors. To validate the effectiveness of our proposed methods, we conduct extensive experiments and ablations on challenging domain-shift scenarios. The results consistently demonstrate the superiority of our approach compared to existing methods. Our code and models are available at: https://github.com/msohaildanish/DivAlign
翻译:本研究致力于解决目标检测中的域泛化问题,特别关注仅有一个源域可用的场景。我们提出一种有效的方法,包含两个关键步骤:源域多样化以及基于类别预测置信度与定位的对齐。首先,我们证明通过精心选择一组数据增强策略,基础检测器能够显著超越现有单域泛化方法,这凸显了域多样化在提升目标检测器性能中的重要性。其次,我们引入一种方法,综合考虑分类与定位输出,对多视角检测结果进行对齐。该对齐过程能够产生泛化能力更强且校准更优的目标检测器模型,这对于安全关键应用中的准确决策至关重要。我们的方法与检测器架构无关,可无缝应用于单阶段与两阶段检测器。为验证所提方法的有效性,我们在具有挑战性的域偏移场景上进行了大量实验与消融研究。结果一致表明,相较于现有方法,我们的方法具有显著优越性。我们的代码与模型已开源:https://github.com/msohaildanish/DivAlign