Object detection on VHR remote sensing images plays a vital role in applications such as urban planning, land resource management, and rescue missions. The large-scale variation of the remote-sensing targets is one of the main challenges in VHR remote-sensing object detection. Existing methods improve the detection accuracy of high-resolution remote sensing objects by improving the structure of feature pyramids and adopting different attention modules. However, for small targets, there still be seriously missed detections due to the loss of key detail features. There is still room for improvement in the way of multiscale feature fusion and balance. To address this issue, this paper proposes two novel modules: Guided Attention and Tucker Bilinear Attention, which are applied to the stages of early fusion and late fusion respectively. The former can effectively retain clean key detail features, and the latter can better balance features through semantic-level correlation mining. Based on two modules, we build a new multi-scale remote sensing object detection framework. No bells and whistles. The proposed method largely improves the average precisions of small objects and achieves the highest mean average precisions compared with 9 state-of-the-art methods on DOTA, DIOR, and NWPU VHR-10.Code and models are available at https://github.com/Shinichict/GTNet.
翻译:甚高分辨率遥感图像中的目标检测在城市规划、土地资源管理及救援任务等应用中发挥着至关重要的作用。遥感目标的大尺度变化是甚高分辨率遥感目标检测的主要挑战之一。现有方法通过改进特征金字塔结构并采用不同的注意力模块来提高高分辨率遥感目标的检测精度。然而,对于小目标而言,由于关键细节特征的丢失,仍存在严重的漏检问题。多尺度特征融合与平衡方式仍有改进空间。针对此问题,本文提出两种新型模块:引导注意力与Tucker双线性注意力,分别应用于早期融合与晚期融合阶段。前者能够有效保留清洁的关键细节特征,后者则通过语义级相关性挖掘更好地平衡特征。基于这两个模块,我们构建了一个新的多尺度遥感目标检测框架。无需任何花哨技巧。所提方法大幅提升了小目标的平均精度,并在DOTA、DIOR和NWPU VHR-10数据集上,与9种最先进方法相比取得了最高的平均精度均值。代码与模型已开源至https://github.com/Shinichict/GTNet。