The demand for accurate object detection in aerial imagery has surged with the widespread use of drones and satellite technology. Traditional object detection models, trained on datasets biased towards large objects, struggle to perform optimally in aerial scenarios where small, densely clustered objects are prevalent. To address this challenge, we present an innovative approach that combines super-resolution and an adapted lightweight YOLOv5 architecture. We employ a range of datasets, including VisDrone-2023, SeaDroneSee, VEDAI, and NWPU VHR-10, to evaluate our model's performance. Our Super Resolved YOLOv5 architecture features Transformer encoder blocks, allowing the model to capture global context and context information, leading to improved detection results, especially in high-density, occluded conditions. This lightweight model not only delivers improved accuracy but also ensures efficient resource utilization, making it well-suited for real-time applications. Our experimental results demonstrate the model's superior performance in detecting small and densely clustered objects, underlining the significance of dataset choice and architectural adaptation for this specific task. In particular, the method achieves 52.5% mAP on VisDrone, exceeding top prior works. This approach promises to significantly advance object detection in aerial imagery, contributing to more accurate and reliable results in a variety of real-world applications.
翻译:随着无人机和卫星技术的广泛应用,航空图像中精确目标检测的需求激增。传统目标检测模型在偏向大物体的数据集上训练,难以在航空场景中表现优异,因为这类场景中存在大量密集的小目标。为解决这一挑战,我们提出了一种创新方法,结合超分辨率与改进的轻量级YOLOv5架构。我们使用包括VisDrone-2023、SeaDroneSee、VEDAI和NWPU VHR-10在内的多种数据集来评估模型性能。本文提出的超分辨率增强YOLOv5架构引入了Transformer编码器模块,使模型能够捕获全局上下文信息,从而在目标密集、遮挡严重的条件下显著提升检测效果。该轻量级模型不仅提高了检测精度,还实现了高效的资源利用,使其非常适合实时应用场景。实验结果表明,该模型在检测小且密集目标方面表现卓越,突显了数据集选择和架构适配对此类特定任务的重要性。特别地,该方法在VisDrone数据集上达到52.5%的平均精度(mAP),超越了先前的最佳工作。该技术有望显著推进航空图像目标检测领域的发展,为多种实际应用提供更准确可靠的结果。