Semantic segmentation in autonomous driving has been undergoing an evolution from sparse point segmentation to dense voxel segmentation, where the objective is to predict the semantic occupancy of each voxel in the concerned 3D space. The dense nature of the prediction space has rendered existing efficient 2D-projection-based methods (e.g., bird's eye view, range view, etc.) ineffective, as they can only describe a subspace of the 3D scene. To address this, we propose a cylindrical tri-perspective view to represent point clouds effectively and comprehensively and a PointOcc model to process them efficiently. Considering the distance distribution of LiDAR point clouds, we construct the tri-perspective view in the cylindrical coordinate system for more fine-grained modeling of nearer areas. We employ spatial group pooling to maintain structural details during projection and adopt 2D backbones to efficiently process each TPV plane. Finally, we obtain the features of each point by aggregating its projected features on each of the processed TPV planes without the need for any post-processing. Extensive experiments on both 3D occupancy prediction and LiDAR segmentation benchmarks demonstrate that the proposed PointOcc achieves state-of-the-art performance with much faster speed. Specifically, despite only using LiDAR, PointOcc significantly outperforms all other methods, including multi-modal methods, with a large margin on the OpenOccupancy benchmark. Code: https://github.com/wzzheng/PointOcc.
翻译:自动驾驶中的语义分割正经历从稀疏点分割向稠密体素分割的演进,其目标是预测三维空间中每个体素的语义占用。预测空间的稠密特性使得现有高效的二维投影方法(如鸟瞰图、距离视图等)失效,因为它们仅能描述三维场景的子空间。为解决这一问题,我们提出了一种柱面三视角表示,能够有效且全面地表征点云,并设计了PointOcc模型对其进行高效处理。考虑到激光雷达点云的距离分布特性,我们在柱面坐标系中构建三视角表示,以对近邻区域进行更精细的建模。我们采用空间分组池化在投影过程中保持结构细节,并利用二维骨干网络高效处理每个TPV平面。最终,通过聚合每个点在每个已处理的TPV平面上的投影特征来获得该点的特征,无需任何后处理。在三维语义占用预测和激光雷达分割基准上的大量实验表明,所提出的PointOcc以更快的速度实现了最先进的性能。具体而言,仅使用激光雷达数据,PointOcc便在OpenOccupancy基准上以大幅优势显著超越了包括多模态方法在内的所有其他方法。代码:https://github.com/wzzheng/PointOcc。