In the realm of autonomous driving,accurately detecting occluded or distant objects,referred to as weak positive sample ,presents significant challenges. These challenges predominantly arise during query initialization, where an over-reliance on heatmap confidence often results in a high rate of false positives, consequently masking weaker detections and impairing system performance. To alleviate this issue, we propose a novel approach, Co-Fix3D, which employs a collaborative hybrid multi-stage parallel query generation mechanism for BEV representations. Our method incorporates the Local-Global Feature Enhancement (LGE) module, which refines BEV features to more effectively highlight weak positive samples. It uniquely leverages the Discrete Wavelet Transform (DWT) for accurate noise reduction and features refinement in localized areas, and incorporates an attention mechanism to more comprehensively optimize global BEV features. Moreover, our method increases the volume of BEV queries through a multi-stage parallel processing of the LGE, significantly enhancing the probability of selecting weak positive samples. This enhancement not only improves training efficiency within the decoder framework but also boosts overall system performance. Notably, Co-Fix3D achieves superior results on the stringent nuScenes benchmark, outperforming all previous models with a 69.1% mAP and 72.9% NDS on the LiDAR-based benchmark, and 72.3% mAP and 74.1% NDS on the multi-modality benchmark, without relying on test-time augmentation or additional datasets. The source code will be made publicly available upon acceptance.
翻译:在自动驾驶领域,准确检测被遮挡或远处的目标(称为弱正样本)面临重大挑战。这些挑战主要出现在查询初始化阶段,其中对热图置信度的过度依赖常导致高误报率,从而掩盖较弱的检测结果并损害系统性能。为缓解此问题,我们提出了一种新颖方法Co-Fix3D,该方法采用协作式混合多阶段并行查询生成机制来处理BEV表示。我们的方法引入了局部-全局特征增强(LGE)模块,该模块通过精炼BEV特征来更有效地突出弱正样本。该方法创新性地利用离散小波变换(DWT)在局部区域实现精确的噪声抑制和特征细化,并结合注意力机制更全面地优化全局BEV特征。此外,通过LGE模块的多阶段并行处理,我们的方法大幅增加了BEV查询的数量,显著提升了弱正样本的选取概率。这种增强不仅提高了解码器框架内的训练效率,也提升了整体系统性能。值得注意的是,Co-Fix3D在严格的nuScenes基准测试中取得了优异成果:在基于LiDAR的基准上以69.1% mAP和72.9% NDS超越所有现有模型,在多模态基准上以72.3% mAP和74.1% NDS达到领先水平,且未使用测试时增强或额外数据集。源代码将在论文录用后公开。