Adaptive Image Zoom-in with Bounding Box Transformation for UAV Object Detection

Detecting objects from UAV-captured images is challenging due to the small object size. In this work, a simple and efficient adaptive zoom-in framework is explored for object detection on UAV images. The main motivation is that the foreground objects are generally smaller and sparser than those in common scene images, which hinders the optimization of effective object detectors. We thus aim to zoom in adaptively on the objects to better capture object features for the detection task. To achieve the goal, two core designs are required: \textcolor{black}{i) How to conduct non-uniform zooming on each image efficiently? ii) How to enable object detection training and inference with the zoomed image space?} Correspondingly, a lightweight offset prediction scheme coupled with a novel box-based zooming objective is introduced to learn non-uniform zooming on the input image. Based on the learned zooming transformation, a corner-aligned bounding box transformation method is proposed. The method warps the ground-truth bounding boxes to the zoomed space to learn object detection, and warps the predicted bounding boxes back to the original space during inference. We conduct extensive experiments on three representative UAV object detection datasets, including VisDrone, UAVDT, and SeaDronesSee. The proposed ZoomDet is architecture-independent and can be applied to an arbitrary object detection architecture. Remarkably, on the SeaDronesSee dataset, ZoomDet offers more than 8.4 absolute gain of mAP with a Faster R-CNN model, with only about 3 ms additional latency. The code is available at https://github.com/twangnh/zoomdet_code.

翻译：在无人机拍摄的图像中检测目标具有挑战性，主要原因是目标尺寸过小。本文研究了一种简单高效的自适应放大框架，用于无人机图像中的目标检测。其主要动机在于，与常见场景图像相比，无人机图像中的前景目标通常更小、更稀疏，这阻碍了有效目标检测器的优化。因此，我们旨在自适应地放大目标，以更好地捕获用于检测任务的目标特征。为实现这一目标，需要两个核心设计：\textcolor{black}{i) 如何高效地对每幅图像进行非均匀放大？ ii) 如何在放大后的图像空间中进行目标检测的训练与推理？} 相应地，我们引入了一种轻量级的偏移量预测方案，结合新颖的基于框的放大目标，以学习对输入图像进行非均匀放大。基于学习到的放大变换，我们提出了一种角点对齐的边界框变换方法。该方法在训练时将真实边界框变换到放大后的空间以学习目标检测，在推理时将预测的边界框变换回原始空间。我们在三个具有代表性的无人机目标检测数据集上进行了广泛的实验，包括VisDrone、UAVDT和SeaDronesSee。所提出的ZoomDet与检测架构无关，可应用于任意的目标检测架构。值得注意的是，在SeaDronesSee数据集上，ZoomDet在使用Faster R-CNN模型时带来了超过8.4个百分点的mAP绝对提升，而仅增加约3毫秒的延迟。代码可在 https://github.com/twangnh/zoomdet_code 获取。