LiDAR segmentation is crucial for autonomous driving perception. Recent trends favor point- or voxel-based methods as they often yield better performance than the traditional range view representation. In this work, we unveil several key factors in building powerful range view models. We observe that the "many-to-one" mapping, semantic incoherence, and shape deformation are possible impediments against effective learning from range view projections. We present RangeFormer -- a full-cycle framework comprising novel designs across network architecture, data augmentation, and post-processing -- that better handles the learning and processing of LiDAR point clouds from the range view. We further introduce a Scalable Training from Range view (STR) strategy that trains on arbitrary low-resolution 2D range images, while still maintaining satisfactory 3D segmentation accuracy. We show that, for the first time, a range view method is able to surpass the point, voxel, and multi-view fusion counterparts in the competing LiDAR semantic and panoptic segmentation benchmarks, i.e., SemanticKITTI, nuScenes, and ScribbleKITTI.
翻译:激光雷达分割对于自动驾驶感知至关重要。近期研究趋势倾向于基于点或体素的方法,因其性能通常优于传统的距离视图表示。在本工作中,我们揭示了构建强大距离视图模型的若干关键因素。观察到"多对一"映射、语义不连贯和形状变形是阻碍从距离视图投影中有效学习的可能障碍。我们提出了RangeFormer——一个包含网络架构、数据增强和后处理全周期新颖设计的完整框架——该框架能更好地从距离视图处理和学习激光雷达点云。我们进一步引入了可扩展距离视图训练(STR)策略,该策略可在任意低分辨率二维距离图像上训练,同时仍保持令人满意的三维分割精度。我们首次证明,在竞争性的激光雷达语义和全景分割基准(即SemanticKITTI、nuScenes和ScribbleKITTI)上,距离视图方法能够超越基于点、体素及多视图融合的方法。