Object detection in unmanned aerial vehicle (UAV) images remains a highly challenging task, primarily caused by the complexity of background noise and the imbalance of target scales. Traditional methods easily struggle to effectively separate objects from intricate backgrounds and fail to fully leverage the rich multi-scale information contained within images. To address these issues, we have developed a synergistic feature fusion network (SFFNet) with dual-domain edge enhancement specifically tailored for object detection in UAV images. Firstly, the multi-scale dynamic dual-domain coupling (MDDC) module is designed. This component introduces a dual-driven edge extraction architecture that operates in both the frequency and spatial domains, enabling effective decoupling of multi-scale object edges from background noise. Secondly, to further enhance the representation capability of the model's neck in terms of both geometric and semantic information, a synergistic feature pyramid network (SFPN) is proposed. SFPN leverages linear deformable convolutions to adaptively capture irregular object shapes and establishes long-range contextual associations around targets through the designed wide-area perception module (WPM). Moreover, to adapt to the various applications or resource-constrained scenarios, six detectors of different scales (N/S/M/B/L/X) are designed. Experiments on two challenging aerial datasets (VisDrone and UAVDT) demonstrate the outstanding performance of SFFNet-X, achieving 36.8 AP and 20.6 AP, respectively. The lightweight models (N/S) also maintain a balance between detection accuracy and parameter efficiency. The code will be available at https://github.com/CQNU-ZhangLab/SFFNet.
翻译:无人机图像中的目标检测仍是一项极具挑战性的任务,主要源于背景噪声的复杂性和目标尺度的不平衡性。传统方法难以有效分离复杂背景中的目标,且无法充分利用图像中包含的丰富多尺度信息。为解决上述问题,我们开发了一种面向无人机图像目标检测的双域边缘增强协同特征融合网络(SFFNet)。首先,设计了多尺度动态双域耦合(MDDC)模块,该组件引入了一种在频域和空间域协同工作的双驱动边缘提取架构,能够有效从背景噪声中解耦多尺度目标边缘。其次,为进一步增强模型颈部在几何和语义信息方面的表征能力,提出了协同特征金字塔网络(SFPN)。SFPN利用线性可变形卷积自适应捕捉不规则目标形状,并通过所设计的大范围感知模块(WPM)建立目标周围的远距离上下文关联。此外,为适配不同应用场景或资源受限环境,我们设计了六种不同规模的检测器(N/S/M/B/L/X)。在两个具有挑战性的航拍数据集(VisDrone和UAVDT)上的实验表明,SFFNet-X分别取得了36.8 AP和20.6 AP的优异性能。轻量级模型(N/S)在检测精度与参数量效率之间也保持了良好平衡。代码将于https://github.com/CQNU-ZhangLab/SFFNet开源。