An increasingly massive number of remote-sensing images spurs the development of extensible object detectors that can detect objects beyond training categories without costly collecting new labeled data. In this paper, we aim to develop open-vocabulary object detection (OVD) technique in aerial images that scales up object vocabulary size beyond training data. The fundamental challenges hinder open vocabulary object detection performance: the qualities of the class-agnostic region proposals and the pseudo-labels that can generalize well to novel object categories. To simultaneously generate high-quality proposals and pseudo-labels, we propose CastDet, a CLIP-activated student-teacher open-vocabulary object Detection framework. Our end-to-end framework following the student-teacher self-learning mechanism employs the RemoteCLIP model as an extra omniscient teacher with rich knowledge. By doing so, our approach boosts not only novel object proposals but also classification. Furthermore, we devise a dynamic label queue strategy to maintain high-quality pseudo labels during batch training. We conduct extensive experiments on multiple existing aerial object detection datasets, which are set up for the OVD task. Experimental results demonstrate our CastDet achieving superior open-vocabulary detection performance, e.g., reaching 40.5\% mAP, which outperforms previous methods Detic/ViLD by 23.7%/14.9% on the VisDroneZSD dataset. To our best knowledge, this is the first work to apply and develop the open-vocabulary object detection technique for aerial images.
翻译:日益增长的遥感图像数量推动了可扩展目标检测器的发展,这类检测器无需收集昂贵的新标注数据便能检测超出训练类别的目标。本文旨在开发适用于航空图像的开词汇目标检测(OVD)技术,以扩展超出训练数据的目标词汇规模。阻碍开放词汇目标检测性能的根本性挑战在于:类别无关的区域提议质量,以及能良好泛化至新目标类别的伪标签质量。为同时生成高质量的提议与伪标签,我们提出CastDet——一种CLIP激活的师生开放词汇目标检测框架。本端到端框架遵循师生自学习机制,采用具备丰富知识的RemoteCLIP模型作为额外全知教师。通过此方法,我们的方法不仅增强了新目标提议质量,还提升了分类性能。此外,我们设计了动态标签队列策略,以在批训练过程中维持高质量伪标签。我们在多个为OVD任务设定的现有航空目标检测数据集上进行了广泛实验。实验结果表明,CastDet实现了卓越的开放词汇检测性能,例如在VisDroneZSD数据集上达到40.5% mAP,较现有方法Detic/ViLD提升23.7%/14.9%。据我们所知,这是首个将开放词汇目标检测技术应用于航空图像并加以发展的研究工作。