The visual feature pyramid has proven its effectiveness and efficiency in target detection tasks. Yet, current methodologies tend to overly emphasize inter-layer feature interaction, neglecting the crucial aspect of intra-layer feature adjustment. Experience underscores the significant advantages of intra-layer feature interaction in enhancing target detection tasks. While some approaches endeavor to learn condensed intra-layer feature representations using attention mechanisms or visual transformers, they overlook the incorporation of global information interaction. This oversight results in increased false detections and missed targets.To address this critical issue, this paper introduces the Global Feature Pyramid Network (GFPNet), an augmented version of PAFPN that integrates global information for enhanced target detection. Specifically, we leverage a lightweight MLP to capture global feature information, utilize the VNC encoder to process these features, and employ a parallel learnable mechanism to extract intra-layer features from the input image. Building on this foundation, we retain the PAFPN method to facilitate inter-layer feature interaction, extracting rich feature details across various levels.Compared to conventional feature pyramids, GFPN not only effectively focuses on inter-layer feature information but also captures global feature details, fostering intra-layer feature interaction and generating a more comprehensive and impactful feature representation. GFPN consistently demonstrates performance improvements over object detection baselines.
翻译:视觉特征金字塔已在目标检测任务中证明了其有效性与高效性。然而,现有方法往往过度强调层间特征交互,忽视了层内特征调整这一关键环节。实践经验表明,层内特征交互在提升目标检测任务中具有显著优势。尽管部分方法尝试利用注意力机制或视觉转换器来学习紧凑的层内特征表示,但它们忽略了全局信息交互的整合。这一忽视导致虚假检测增多及目标漏检问题加剧。为解决这一关键问题,本文提出全局特征金字塔网络(GFPNet),它是PAFPN的增强版本,通过整合全局信息来强化目标检测。具体而言,我们采用轻量级MLP捕获全局特征信息,利用VNC编码器处理这些特征,并借助并行可学习机制从输入图像中提取层内特征。在此基础上,保留PAFPN方法以促进层间特征交互,从而跨层级提取丰富的特征细节。与传统特征金字塔相比,GFPN不仅能有效聚焦于层间特征信息,还能捕获全局特征细节,促进层内特征交互,生成更全面、更具影响力的特征表示。GFPN在目标检测基准上持续展现出优于基线方法的性能提升。