Detecting objects from UAV-captured images is challenging due to the small object size. In this work, a simple and efficient adaptive zoom-in framework is explored for object detection on UAV images. The main motivation is that the foreground objects are generally smaller and sparser than those in common scene images, which hinders the optimization of effective object detectors. We thus aim to zoom in adaptively on the objects to better capture object features for the detection task. To achieve the goal, two core designs are required: \textcolor{black}{i) How to conduct non-uniform zooming on each image efficiently? ii) How to enable object detection training and inference with the zoomed image space?} Correspondingly, a lightweight offset prediction scheme coupled with a novel box-based zooming objective is introduced to learn non-uniform zooming on the input image. Based on the learned zooming transformation, a corner-aligned bounding box transformation method is proposed. The method warps the ground-truth bounding boxes to the zoomed space to learn object detection, and warps the predicted bounding boxes back to the original space during inference. We conduct extensive experiments on three representative UAV object detection datasets, including VisDrone, UAVDT, and SeaDronesSee. The proposed ZoomDet is architecture-independent and can be applied to an arbitrary object detection architecture. Remarkably, on the SeaDronesSee dataset, ZoomDet offers more than 8.4 absolute gain of mAP with a Faster R-CNN model, with only about 3 ms additional latency. The code is available at https://github.com/twangnh/zoomdet_code.
翻译:在无人机拍摄的图像中检测目标具有挑战性,主要原因是目标尺寸较小。本文探索了一种简单高效的自适应放大框架,用于无人机图像上的目标检测。其主要动机在于,前景目标通常比常见场景图像中的目标更小、更稀疏,这阻碍了有效目标检测器的优化。因此,我们旨在自适应地放大目标,以便更好地为检测任务捕捉目标特征。为实现这一目标,需要两个核心设计:\textcolor{black}{i) 如何高效地对每幅图像进行非均匀放大? ii) 如何在放大后的图像空间中进行目标检测的训练与推理?} 相应地,我们引入了一种轻量级的偏移预测方案,结合新颖的基于框的放大目标,以学习对输入图像进行非均匀放大。基于学习到的放大变换,提出了一种角点对齐的边界框变换方法。该方法将真实标注边界框变换到放大后的空间以学习目标检测,并在推理时将预测的边界框变换回原始空间。我们在三个具有代表性的无人机目标检测数据集上进行了广泛的实验,包括 VisDrone、UAVDT 和 SeaDronesSee。所提出的 ZoomDet 与架构无关,可应用于任意的目标检测架构。值得注意的是,在 SeaDronesSee 数据集上,ZoomDet 在使用 Faster R-CNN 模型时带来了超过 8.4 个百分点的 mAP 绝对提升,而仅增加约 3 毫秒的额外延迟。代码可在 https://github.com/twangnh/zoomdet_code 获取。