RefAerial: A Benchmark and Approach for Referring Detection in Aerial Images

Referring detection refers to locate the target referred by natural languages, which has recently attracted growing research interests. However, existing datasets are limited to ground images with large object centered in relative small scenes. This paper introduces a large-scale challenging dataset for referring detection in aerial images, termed as RefAerial. It distinguishes from conventional ground referring detection datasets by 4 characteristics: (1) low but diverse object-to-scene ratios, (2) numerous targets and distractors, (3)complex and fine-grained referring descriptions, (4) diverse and broad scenes in the aerial view. We also develop a human-in-the-loop referring expansion and annotation engine (REA-Engine) for efficient semi-automated referring pair annotation. Besides, we observe that existing ground referring detection approaches exhibiting serious performance degradation on our aerial dataset since the intrinsic scale variety issue within or across aerial images. Therefore, we further propose a novel scale-comprehensive and sensitive (SCS) framework for referring detection in aerial images. It consists of a mixture-of-granularity (MoG) attention and a two-stage comprehensive-to-sensitive (CtS) decoding strategy. Specifically, the mixture-of-granularity attention is developed for scale-comprehensive target understanding. In addition, the two-stage comprehensive-to-sensitive decoding strategy is designed for coarse-to-fine referring target decoding. Eventually, the proposed SCS framework achieves remarkable performance on our aerial referring detection dataset and even promising performance boost on conventional ground referring detection datasets.

翻译：指代检测旨在定位自然语言所指代的目标，近年来吸引了日益增多的研究兴趣。然而，现有数据集仅限于以相对较小场景中的大型物体为中心的地面图像。本文提出一个面向航拍图像指代检测的大规模挑战性数据集，称为RefAerial。它通过四个特征区别于传统地面指代检测数据集：（1）低但多样化的目标-场景比例；（2）大量目标与干扰物；（3）复杂且细粒度的指代描述；（4）航拍视角下多样且广阔的场景。我们还开发了一种人在回路的指代扩展与标注引擎（REA-Engine），用于高效的半自动化指代对标注。此外，我们观察到现有地面指代检测方法在我们的航拍数据集上表现出严重的性能退化，原因是航拍图像内部或跨图像的固有尺度多样性问题。因此，我们进一步提出一种新颖的尺度全面且敏感（SCS）框架用于航拍图像指代检测。它由混合粒度（MoG）注意力机制和两阶段全面到敏感（CtS）解码策略组成。具体而言，混合粒度注意力机制用于实现尺度全面的目标理解。此外，两阶段全面到敏感解码策略用于实现从粗到细的指代目标解码。最终，所提出的SCS框架在我们的航拍指代检测数据集上取得了卓越性能，甚至对传统地面指代检测数据集也带来了显著的性能提升。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

《利用可见光范围图像传感器在低照度下进行被动四维成像和识别》美国海军研究办公室报告

专知会员服务

24+阅读 · 2023年9月11日

《用生成性对抗网络增强无人机图像分类训练集》美国空军技术学院2022最新209页论文

专知会员服务

55+阅读 · 2022年11月14日

小目标如何检测？西工大韩军伟等发布《大规模小目标检测》综述，20页pdf全面阐述小目标检测方法和自动驾驶与空中场景基准数据集

专知会员服务

94+阅读 · 2022年7月29日