Object detection in Unmanned Aerial Vehicle (UAV) images has emerged as a focal area of research, which presents two significant challenges: i) objects are typically small and dense within vast images; ii) computational resource constraints render most models unsuitable for real-time deployment. Current real-time object detectors are not optimized for UAV images, and complex methods designed for small object detection often lack real-time capabilities. To address these challenges, we propose a novel detector, RemDet (Reparameter efficient multiplication Detector). Our contributions are as follows: 1) Rethinking the challenges of existing detectors for small and dense UAV images, and proposing information loss as a design guideline for efficient models. 2) We introduce the ChannelC2f module to enhance small object detection performance, demonstrating that high-dimensional representations can effectively mitigate information loss. 3) We design the GatedFFN module to provide not only strong performance but also low latency, effectively addressing the challenges of real-time detection. Our research reveals that GatedFFN, through the use of multiplication, is more cost-effective than feed-forward networks for high-dimensional representation. 4) We propose the CED module, which combines the advantages of ViT and CNN downsampling to effectively reduce information loss. It specifically enhances context information for small and dense objects. Extensive experiments on large UAV datasets, Visdrone and UAVDT, validate the real-time efficiency and superior performance of our methods. On the challenging UAV dataset VisDrone, our methods not only provided state-of-the-art results, improving detection by more than 3.4%, but also achieve 110 FPS on a single 4090.
翻译:无人机图像中的目标检测已成为研究热点领域,其面临两大挑战:i) 目标通常尺寸小且在广阔图像中分布密集;ii) 计算资源限制使得多数模型难以实现实时部署。现有实时目标检测器未针对无人机图像优化,而专为小目标检测设计的复杂方法往往缺乏实时性。为应对这些挑战,我们提出一种新型检测器 RemDet(重参数化高效乘法检测器)。我们的贡献如下:1) 重新审视现有检测器在处理无人机小尺寸密集目标图像时面临的挑战,并提出以信息损失作为高效模型设计的指导原则。2) 我们提出 ChannelC2f 模块以提升小目标检测性能,证明高维表征能有效缓解信息损失。3) 我们设计 GatedFFN 模块,在提供强劲性能的同时保持低延迟,有效应对实时检测的挑战。研究发现 GatedFFN 通过乘法运算,在高维表征方面比前馈网络更具成本效益。4) 我们提出 CED 模块,融合 ViT 与 CNN 下采样的优势以有效减少信息损失,特别增强了小尺寸密集目标的上下文信息。在大型无人机数据集 VisDrone 和 UAVDT 上的大量实验验证了我们方法的实时效率与优越性能。在具有挑战性的无人机数据集 VisDrone 上,我们的方法不仅取得了最先进的结果(检测性能提升超过 3.4%),同时在单张 4090 显卡上实现了 110 FPS 的检测速度。