In recent years, aerial object detection has been increasingly pivotal in various earth observation applications. However, current algorithms are limited to detecting a set of pre-defined object categories, demanding sufficient annotated training samples, and fail to detect novel object categories. In this paper, we put forth a novel formulation of the aerial object detection problem, namely open-vocabulary aerial object detection (OVAD), which can detect objects beyond training categories without costly collecting new labeled data. We propose CastDet, a CLIP-activated student-teacher detection framework that serves as the first OVAD detector specifically designed for the challenging aerial scenario, where objects often exhibit weak appearance features and arbitrary orientations. Our framework integrates a robust localization teacher along with several box selection strategies to generate high-quality proposals for novel objects. Additionally, the RemoteCLIP model is adopted as an omniscient teacher, which provides rich knowledge to enhance classification capabilities for novel categories. A dynamic label queue is devised to maintain high-quality pseudo-labels during training. By doing so, the proposed CastDet boosts not only novel object proposals but also classification. Furthermore, we extend our approach from horizontal OVAD to oriented OVAD with tailored algorithm designs to effectively manage bounding box representation and pseudo-label generation. Extensive experiments for both tasks on multiple existing aerial object detection datasets demonstrate the effectiveness of our approach. The code is available at https://github.com/VisionXLab/CastDet.
翻译:近年来,航空目标检测在各种地球观测应用中日益关键。然而,现有算法仅限于检测一组预定义的目标类别,需要充足的标注训练样本,且无法检测新颖目标类别。本文提出了一种新颖的航空目标检测问题表述,即开放词汇航空目标检测(OVAD),该框架能够在无需昂贵标注数据收集的情况下检测训练类别之外的目标。我们提出了CastDet,一种基于CLIP激活的师生检测框架,作为首个专为具有挑战性的航空场景设计的OVAD检测器,该场景中的目标通常呈现弱外观特征和任意方向。我们的框架整合了鲁棒的定位教师模块及多种边界框选择策略,以生成针对新颖目标的高质量候选区域。此外,采用RemoteCLIP模型作为全知教师,提供丰富知识以增强对新类别的分类能力。我们设计了动态标签队列以在训练过程中维护高质量的伪标签。通过这种方式,所提出的CastDet不仅提升了新颖目标候选区域的生成质量,还增强了分类性能。进一步地,我们通过定制化算法设计将方法从水平OVAD扩展至定向OVAD,以有效管理边界框表示和伪标签生成。在多个现有航空目标检测数据集上对两项任务进行的广泛实验验证了我们方法的有效性。代码发布于https://github.com/VisionXLab/CastDet。