Unmanned Aerial Vehicle (UAV) applications have become increasingly prevalent in aerial photography and object recognition. However, there are major challenges to accurately capturing small targets in object detection due to the imbalanced scale and the blurred edges. To address these issues, boundary and position information mining (BPIM) framework is proposed for capturing object edge and location cues. The proposed BPIM includes position information guidance (PIG) module for obtaining location information, boundary information guidance (BIG) module for extracting object edge, cross scale fusion (CSF) module for gradually assembling the shallow layer image feature, three feature fusion (TFF) module for progressively combining position and boundary information, and adaptive weight fusion (AWF) module for flexibly merging the deep layer semantic feature. Therefore, BPIM can integrate boundary, position, and scale information in image for small object detection using attention mechanisms and cross-scale feature fusion strategies. Furthermore, BPIM not only improves the discrimination of the contextual feature by adaptive weight fusion with boundary, but also enhances small object perceptions by cross-scale position fusion. On the VisDrone2021, DOTA1.0, and WiderPerson datasets, experimental results show the better performances of BPIM compared to the baseline Yolov5-P2, and obtains the promising performance in the state-of-the-art methods with comparable computation load.
翻译:无人机应用在航拍摄影与目标识别领域日益普及。然而,由于目标尺度不平衡与边缘模糊,在目标检测中精准捕捉小目标仍面临重大挑战。为解决这些问题,本文提出边界与位置信息挖掘框架,用于捕获目标边缘与位置线索。所提出的BPIM框架包含:用于获取位置信息的位置信息引导模块、用于提取目标边缘的边界信息引导模块、用于逐层整合浅层图像特征的跨尺度融合模块、用于渐进融合位置与边界信息的三特征融合模块,以及用于灵活融合深层语义特征的自适应权重融合模块。因此,BPIM能够通过注意力机制与跨尺度特征融合策略,整合图像中的边界、位置与尺度信息以实现小目标检测。此外,BPIM不仅通过边界信息的自适应权重融合提升了上下文特征的判别力,还通过跨尺度位置融合增强了对小目标的感知能力。在VisDrone2021、DOTA1.0和WiderPerson数据集上的实验结果表明,BPIM相较于基线模型Yolov5-P2具有更优性能,并在计算负载相当的情况下取得了与前沿方法相媲美的优异表现。