Due to the effective multi-scale feature fusion capabilities of the Path Aggregation FPN (PAFPN), it has become a widely adopted component in YOLO-based detectors. However, PAFPN struggles to integrate high-level semantic cues with low-level spatial details, limiting its performance in real-world applications, especially with significant scale variations. In this paper, we propose MHAF-YOLO, a novel detection framework featuring a versatile neck design called the Multi-Branch Auxiliary FPN (MAFPN), which consists of two key modules: the Superficial Assisted Fusion (SAF) and Advanced Assisted Fusion (AAF). The SAF bridges the backbone and the neck by fusing shallow features, effectively transferring crucial low-level spatial information with high fidelity. Meanwhile, the AAF integrates multi-scale feature information at deeper neck layers, delivering richer gradient information to the output layer and further enhancing the model learning capacity. To complement MAFPN, we introduce the Global Heterogeneous Flexible Kernel Selection (GHFKS) mechanism and the Reparameterized Heterogeneous Multi-Scale (RepHMS) module to enhance feature fusion. RepHMS is globally integrated into the network, utilizing GHFKS to select larger convolutional kernels for various feature layers, expanding the vertical receptive field and capturing contextual information across spatial hierarchies. Locally, it optimizes convolution by processing both large and small kernels within the same layer, broadening the lateral receptive field and preserving crucial details for detecting smaller targets. The source code of this work is available at: https://github.com/yang0201/MHAF-YOLO.
翻译:得益于路径聚合特征金字塔网络(PAFPN)有效的多尺度特征融合能力,它已成为基于YOLO的检测器中广泛采用的组件。然而,PAFPN难以将高层语义线索与低层空间细节有效整合,这限制了其在现实应用中的性能,尤其是在目标尺度变化显著的情况下。本文提出MHAF-YOLO,一种新颖的检测框架,其核心是一种称为多分支辅助特征金字塔网络(MAFPN)的通用颈部设计。MAFPN包含两个关键模块:浅层辅助融合(SAF)与高级辅助融合(AAF)。SAF通过融合浅层特征连接主干网络与颈部网络,以高保真度有效传递关键的低层空间信息。同时,AAF在更深的颈部网络层中整合多尺度特征信息,为输出层提供更丰富的梯度信息,从而进一步增强模型的学习能力。为了完善MAFPN,我们引入了全局异构灵活核选择(GHFKS)机制和重参数化异构多尺度(RepHMS)模块以增强特征融合。RepHMS被全局集成到网络中,利用GHFKS为不同特征层选择更大的卷积核,从而扩展垂直感受野并捕获跨空间层级的上下文信息。在局部层面,它通过在同一层内同时处理大核与小核来优化卷积操作,拓宽横向感受野并保留检测较小目标所需的关键细节。本工作的源代码发布于:https://github.com/yang0201/MHAF-YOLO。