Recently 3D object detection from surround-view images has made notable advancements with its low deployment cost. However, most works have primarily focused on close perception range while leaving long-range detection less explored. Expanding existing methods directly to cover long distances poses challenges such as heavy computation costs and unstable convergence. To address these limitations, this paper proposes a novel sparse query-based framework, dubbed Far3D. By utilizing high-quality 2D object priors, we generate 3D adaptive queries that complement the 3D global queries. To efficiently capture discriminative features across different views and scales for long-range objects, we introduce a perspective-aware aggregation module. Additionally, we propose a range-modulated 3D denoising approach to address query error propagation and mitigate convergence issues in long-range tasks. Significantly, Far3D demonstrates SoTA performance on the challenging Argoverse 2 dataset, covering a wide range of 150 meters, surpassing several LiDAR-based approaches. Meanwhile, Far3D exhibits superior performance compared to previous methods on the nuScenes dataset. The code will be available soon.
翻译:近期,基于环视图像的三维目标检测凭借其较低的部署成本取得了显著进展。然而,现有工作大多聚焦于近距离感知范围,对远距离检测的探索相对不足。若直接将现有方法扩展至长距离场景,将面临计算成本高昂、收敛不稳定等挑战。为解决上述问题,本文提出一种新颖的稀疏查询框架——Far3D。通过利用高质量二维目标先验信息,我们生成了能够补充三维全局查询的三维自适应查询。为有效捕获远距离目标在多视角、多尺度下的判别性特征,我们引入了一种视角感知聚合模块。此外,我们提出了一种基于距离调节的三维去噪方法,以应对查询误差传播并缓解长距离任务中的收敛问题。值得关注的是,Far3D在覆盖150米广阔范围的具有挑战性的Argoverse 2数据集上展现了最先进的性能,甚至超越了部分基于激光雷达的方法。同时,Far3D在nuScenes数据集上也表现出优于先前方法的性能。相关代码即将开源。