Due to the effective performance of multi-scale feature fusion, Path Aggregation FPN (PAFPN) is widely employed in YOLO detectors. However, it cannot efficiently and adaptively integrate high-level semantic information with low-level spatial information simultaneously. We propose a new model named MAF-YOLO in this paper, which is a novel object detection framework with a versatile neck named Multi-Branch Auxiliary FPN (MAFPN). Within MAFPN, the Superficial Assisted Fusion (SAF) module is designed to combine the output of the backbone with the neck, preserving an optimal level of shallow information to facilitate subsequent learning. Meanwhile, the Advanced Assisted Fusion (AAF) module deeply embedded within the neck conveys a more diverse range of gradient information to the output layer. Furthermore, our proposed Re-parameterized Heterogeneous Efficient Layer Aggregation Network (RepHELAN) module ensures that both the overall model architecture and convolutional design embrace the utilization of heterogeneous large convolution kernels. Therefore, this guarantees the preservation of information related to small targets while simultaneously achieving the multi-scale receptive field. Finally, taking the nano version of MAF-YOLO for example, it can achieve 42.4% AP on COCO with only 3.76M learnable parameters and 10.51G FLOPs, and approximately outperforms YOLOv8n by about 5.1%. The source code of this work is available at: https://github.com/yang-0201/MAF-YOLO.
翻译:由于多尺度特征融合的有效性能,路径聚合特征金字塔网络(PAFPN)被广泛应用于YOLO检测器中。然而,它无法同时高效且自适应地整合高层语义信息与低层空间信息。本文提出一种名为MAF-YOLO的新模型,该模型是一种配备多功能颈部结构——多分支辅助特征金字塔网络(MAFPN)的新型目标检测框架。在MAFPN内部,表层辅助融合(SAF)模块被设计用于融合主干网络与颈部网络的输出,保留最优的浅层信息以促进后续学习。同时,深度嵌入颈部结构的高级辅助融合(AAF)模块向输出层传递更多样化的梯度信息。此外,我们提出的重参数化异构高效层聚合网络(RepHELAN)模块确保整体模型架构和卷积设计均采用异构大卷积核,从而在实现多尺度感受野的同时保证小目标相关信息的保留。以MAF-YOLO的纳米版本为例,该模型仅需376万可学习参数和10.51G FLOPs即可在COCO数据集上实现42.4%的平均精度,性能较YOLOv8n提升约5.1%。本工作的源代码公开于:https://github.com/yang-0201/MAF-YOLO。