Domain adaptation is crucial in aerial imagery, as the visual representation of these images can significantly vary based on factors such as geographic location, time, and weather conditions. Additionally, high-resolution aerial images often require substantial storage space and may not be readily accessible to the public. To address these challenges, we propose a novel Source-Free Object Detection (SFOD) method. Specifically, our approach is built upon a self-training framework; however, self-training can lead to inaccurate learning in the absence of labeled training data. To address this issue, we further integrate Contrastive Language-Image Pre-training (CLIP) to guide the generation of pseudo-labels, termed CLIP-guided Aggregation. By leveraging CLIP's zero-shot classification capability, we use it to aggregate scores with the original predicted bounding boxes, enabling us to obtain refined scores for the pseudo-labels. To validate the effectiveness of our method, we constructed two new datasets from different domains based on the DIOR dataset, named DIOR-C and DIOR-Cloudy. Experiments demonstrate that our method outperforms other comparative algorithms.
翻译:域自适应在航空影像中至关重要,因为这类影像的视觉表征会因地理位置、时间和天气条件等因素发生显著变化。此外,高分辨率航空影像通常需要大量存储空间,且可能不易向公众开放。为解决这些挑战,我们提出了一种新颖的无源目标检测方法。具体而言,我们的方法基于自训练框架构建;然而,在缺乏标注训练数据的情况下,自训练可能导致不准确的学习。针对这一问题,我们进一步融合对比语言-图像预训练(CLIP)来指导伪标签的生成,称为CLIP引导聚合。通过利用CLIP的零样本分类能力,我们将其与原始预测边界框的分数进行聚合,从而获得伪标签的精炼分数。为验证方法的有效性,我们基于DIOR数据集构建了两个来自不同域的新数据集,分别命名为DIOR-C和DIOR-Cloudy。实验表明,我们的方法优于其他对比算法。