LiDAR segmentation is crucial for autonomous driving perception. Recent trends favor point- or voxel-based methods as they often yield better performance than the traditional range view representation. In this work, we unveil several key factors in building powerful range view models. We observe that the "many-to-one" mapping, semantic incoherence, and shape deformation are possible impediments against effective learning from range view projections. We present RangeFormer -- a full-cycle framework comprising novel designs across network architecture, data augmentation, and post-processing -- that better handles the learning and processing of LiDAR point clouds from the range view. We further introduce a Scalable Training from Range view (STR) strategy that trains on arbitrary low-resolution 2D range images, while still maintaining satisfactory 3D segmentation accuracy. We show that, for the first time, a range view method is able to surpass the point, voxel, and multi-view fusion counterparts in the competing LiDAR semantic and panoptic segmentation benchmarks, i.e., SemanticKITTI, nuScenes, and ScribbleKITTI.
翻译:激光雷达分割对于自动驾驶感知至关重要。近期趋势倾向于基于点或体素的方法,因为这些方法往往比传统的范围视图表示获得更好的性能。本研究中,我们揭示了构建强大范围视图模型的若干关键因素。我们观察到“多对一”映射、语义不一致性和形状变形可能是阻碍从范围视图投影进行有效学习的潜在障碍。我们提出了RangeFormer——一个包含网络架构、数据增强和后处理全新设计的全周期框架——能够更好地从范围视图处理和学习激光雷达点云。我们进一步引入了从范围视图的可扩展训练(STR)策略,该策略可在任意低分辨率2D范围图像上训练,同时仍能保持令人满意的3D分割精度。我们证明,范围视图方法首次在竞争性的激光雷达语义和全景分割基准测试(即SemanticKITTI、nuScenes和ScribbleKITTI)中超越了基于点、体素和多视图融合的方法。