Domain adaptation is crucial in aerial imagery, as the visual representation of these images can significantly vary based on factors such as geographic location, time, and weather conditions. Additionally, high-resolution aerial images often require substantial storage space and may not be readily accessible to the public. To address these challenges, we propose a novel Source-Free Object Detection (SFOD) method. Specifically, our approach begins with a self-training framework, which significantly enhances the performance of baseline methods. To alleviate the noisy labels in self-training, we utilize Contrastive Language-Image Pre-training (CLIP) to guide the generation of pseudo-labels, termed CLIP-guided Aggregation (CGA). By leveraging CLIP's zero-shot classification capability, we aggregate its scores with the original predicted bounding boxes, enabling us to obtain refined scores for the pseudo-labels. To validate the effectiveness of our method, we constructed two new datasets from different domains based on the DIOR dataset, named DIOR-C and DIOR-Cloudy. Experimental results demonstrate that our method outperforms other comparative algorithms. The code is available at https://github.com/Lans1ng/SFOD-RS.
翻译:在航空影像中,域适应至关重要,因为此类图像的视觉表征会因地理位置、时间和天气条件等因素而发生显著变化。此外,高分辨率航空图像通常需要大量存储空间,且可能不易向公众开放。为应对这些挑战,我们提出了一种新颖的无源目标检测方法。具体而言,我们的方法始于一个自训练框架,该框架显著提升了基线方法的性能。为缓解自训练中的噪声标签问题,我们利用对比语言-图像预训练模型来引导伪标签的生成,该方法称为CLIP引导聚合。通过利用CLIP的零样本分类能力,我们将其得分与原始预测边界框进行聚合,从而获得伪标签的精细化得分。为验证方法的有效性,我们基于DIOR数据集构建了两个来自不同域的新数据集,分别命名为DIOR-C和DIOR-Cloudy。实验结果表明,我们的方法优于其他对比算法。代码可在 https://github.com/Lans1ng/SFOD-RS 获取。