3D object detection from multi-view images in traffic scenarios has garnered significant attention in recent years. Many existing approaches rely on object queries that are generated from 3D reference points to localize objects. However, a limitation of these methods is that some reference points are often far from the target object, which can lead to false positive detections. In this paper, we propose a depth-guided query generator for 3D object detection (DQ3D) that leverages depth information and 2D detections to ensure that reference points are sampled from the surface or interior of the object. Furthermore, to address partially occluded objects in current frame, we introduce a hybrid attention mechanism that fuses historical detection results with depth-guided queries, thereby forming hybrid queries. Evaluation on the nuScenes dataset demonstrates that our method outperforms the baseline by 6.3\% in terms of mean Average Precision (mAP) and 4.3\% in the NuScenes Detection Score (NDS).
翻译:交通场景中基于多视图图像的三维目标检测近年来受到广泛关注。许多现有方法依赖于从三维参考点生成的对象查询来定位目标。然而,这些方法的一个局限在于,部分参考点常常远离目标物体,这可能导致误检。本文提出了一种用于三维目标检测的深度引导查询生成器(DQ3D),它利用深度信息和二维检测结果,确保参考点从物体表面或内部采样。此外,为解决当前帧中部分被遮挡物体的问题,我们引入了一种混合注意力机制,将历史检测结果与深度引导查询相融合,从而形成混合查询。在nuScenes数据集上的评估表明,我们的方法在平均精度均值(mAP)上优于基线6.3%,在NuScenes检测分数(NDS)上优于基线4.3%。