In order to deal with the sparse and unstructured raw point clouds, LiDAR based 3D object detection research mostly focuses on designing dedicated local point aggregators for fine-grained geometrical modeling. In this paper, we revisit the local point aggregators from the perspective of allocating computational resources. We find that the simplest pillar based models perform surprisingly well considering both accuracy and latency. Additionally, we show that minimal adaptions from the success of 2D object detection, such as enlarging receptive field, significantly boost the performance. Extensive experiments reveal that our pillar based networks with modernized designs in terms of architecture and training render the state-of-the-art performance on the two popular benchmarks: Waymo Open Dataset and nuScenes. Our results challenge the common intuition that the detailed geometry modeling is essential to achieve high performance for 3D object detection.
翻译:为了处理稀疏且非结构化的原始点云,基于激光雷达的3D目标检测研究大多聚焦于设计专用的局部点聚合器以实现细粒度几何建模。本文从计算资源分配的角度重新审视了局部点聚合器,发现最简单的柱状网络模型在精度与延迟方面均表现惊人。此外,我们证明,借鉴2D目标检测的成功经验进行最小限度的适配(例如扩大感受野),即可显著提升性能。大量实验表明,采用现代化架构与训练设计的柱状网络,在Waymo开放数据集与nuScenes这两个主流基准上均达到了最先进的性能。我们的结果挑战了“详细几何建模对实现3D目标检测高性能至关重要”这一普遍认知。