In visual recognition, scale bias is a key challenge due to the imbalance of object and image size distribution inherent in real scene datasets. Conventional solutions involve injecting scale invariance priors, oversampling the dataset at different scales during training, or adjusting scale at inference. While these strategies mitigate scale bias to some extent, their ability to adapt across diverse datasets is limited. Besides, they increase computational load during training and latency during inference. In this work, we use adaptive attentional processing -- oversampling salient object regions by warping images in-place during training. Discovering that shifting the source scale distribution improves backbone features, we developed a instance-level warping guidance aimed at object region sampling to mitigate source scale bias in domain adaptation. Our approach improves adaptation across geographies, lighting and weather conditions, is agnostic to the task, domain adaptation algorithm, saliency guidance, and underlying model architecture. Highlights include +6.1 mAP50 for BDD100K Clear $\rightarrow$ DENSE Foggy, +3.7 mAP50 for BDD100K Day $\rightarrow$ Night, +3.0 mAP50 for BDD100K Clear $\rightarrow$ Rainy, and +6.3 mIoU for Cityscapes $\rightarrow$ ACDC. Our approach adds minimal memory during training and has no additional latency at inference time. Please see Appendix for more results and analysis.
翻译:在视觉识别中,由于真实场景数据集固有的物体与图像尺寸分布不均,尺度偏差是一项关键挑战。传统解决方案包括注入尺度不变性先验、在训练过程中对不同尺度的数据集进行过采样,或在推理时调整尺度。虽然这些策略在一定程度上缓解了尺度偏差,但其在不同数据集间的自适应能力有限,且会增加训练时的计算负载和推理时的延迟。本研究采用自适应注意力处理——通过在训练时原位扭曲图像来对显著目标区域进行过采样。我们发现,改变源尺度分布能够改进主干网络特征,为此我们开发了一种针对目标区域采样的实例级扭曲引导方法,以缓解域自适应中的源尺度偏差。我们的方法可提升跨地理区域、光照和天气条件下的自适应性能,且与任务类型、域自适应算法、显著性引导方法及底层模型架构无关。实验亮点包括:BDD100K Clear→DENSE Foggy 提升 +6.1 mAP50,BDD100K Day→Night 提升 +3.7 mAP50,BDD100K Clear→Rainy 提升 +3.0 mAP50,以及 Cityscapes→ACDC 提升 +6.3 mIoU。本方法在训练过程中仅增加极小的内存开销,且推理时无额外延迟。更多结果与分析请参见附录。