Most recent approaches for 3D object detection predominantly rely on point-view or bird's-eye view representations, with limited exploration of range-view-based methods. The range-view representation suffers from scale variation and surface texture deficiency, both of which pose significant limitations for developing corresponding methods. Notably, the surface texture loss problem has been largely ignored by all existing methods, despite its significant impact on the accuracy of range-view-based 3D object detection. In this study, we propose Redemption from Range-view R-CNN (R2 R-CNN), a novel and accurate approach that comprehensively explores the range-view representation. Our proposed method addresses scale variation through the HD Meta Kernel, which captures range-view geometry information in multiple scales. Additionally, we introduce Feature Points Redemption (FPR) to recover the lost 3D surface texture information from the range view, and Synchronous-Grid RoI Pooling (S-Grid RoI Pooling), a multi-scaled approach with multiple receptive fields for accurate box refinement. Our R2 R-CNN outperforms existing range-view-based methods, achieving state-of-the-art performance on both the KITTI benchmark and the Waymo Open Dataset. Our study highlights the critical importance of addressing the surface texture loss problem for accurate 3D object detection in range-view-based methods. Codes will be made publicly available.
翻译:当前大多数三维目标检测方法主要依赖点视图或鸟瞰图表示,而对基于距离视图的方法探索有限。距离视图表示存在尺度变化和表面纹理缺失两大问题,严重制约了相关方法的发展。值得注意的是,现有所有方法都忽略了表面纹理缺失问题,尽管这对基于距离视图的三维目标检测精度有重大影响。本研究提出"从距离视图拯救R-CNN"(R² R-CNN),一种全面探索距离视图表示的新颖且精确的方法。该方法通过HD Meta Kernel解决尺度变化问题,该机制能以多尺度捕获距离视图几何信息。此外,我们引入特征点拯救(FPR)来恢复从距离视图中丢失的三维表面纹理信息,并提出同步网格RoI池化(S-Grid RoI Pooling),一种具有多感受野的多尺度方法以实现精确的框体优化。我们的R² R-CNN超越了现有基于距离视图的方法,在KITTI基准测试和Waymo开放数据集上均达到最先进性能。本研究揭示了解决表面纹理缺失问题对基于距离视图的精确三维目标检测至关重要的关键性。代码将公开发布。