Detecting objects across various scales remains a significant challenge in computer vision, particularly in tasks such as Rice Leaf Disease (RLD) detection, where objects exhibit considerable scale variations. Traditional object detection methods often struggle to address these variations, resulting in missed detections or reduced accuracy. In this study, we propose the multi-scale Attention Pyramid module (mAPm), a novel approach that integrates dilated convolutions into the Feature Pyramid Network (FPN) to enhance multi-scale information ex-traction. Additionally, we incorporate a global Multi-Head Self-Attention (MHSA) mechanism and a deconvolutional layer to refine the up-sampling process. We evaluate mAPm on YOLOv7 using the MRLD and COCO datasets. Compared to vanilla FPN, BiFPN, NAS-FPN, PANET, and ACFPN, mAPm achieved a significant improvement in Average Precision (AP), with a +2.61% increase on the MRLD dataset compared to the baseline FPN method in YOLOv7. This demonstrates its effectiveness in handling scale variations. Furthermore, the versatility of mAPm allows its integration into various FPN-based object detection models, showcasing its potential to advance object detection techniques.
翻译:跨尺度目标检测仍是计算机视觉中的重大挑战,尤其在尺度差异显著的水稻叶病(RLD)等检测任务中。传统目标检测方法难以应对此类尺度变化,常导致漏检或精度下降。本研究提出多尺度注意力金字塔模块(mAPm),该创新方法将扩张卷积集成至特征金字塔网络(FPN)以增强多尺度信息提取,同时引入全局多头自注意力(MHSA)机制与反卷积层优化上采样过程。我们在MRLD与COCO数据集上基于YOLOv7评估mAPm。与原始FPN、BiFPN、NAS-FPN、PANET及ACFPN相比,mAPm的平均精度(AP)显著提升:在MRLD数据集上相较于YOLOv7基线FPN方法取得+2.61%的增幅。这证明了该模块处理尺度变化的有效性。此外,mAPm的通用性使其可集成至各类基于FPN的目标检测模型,展现了推动目标检测技术发展的潜力。