Point-, voxel-, and range-views are three representative forms of point clouds. All of them have accurate 3D measurements but lack color and texture information. RGB images are a natural complement to these point cloud views and fully utilizing the comprehensive information of them benefits more robust perceptions. In this paper, we present a unified multi-modal LiDAR segmentation network, termed UniSeg, which leverages the information of RGB images and three views of the point cloud, and accomplishes semantic segmentation and panoptic segmentation simultaneously. Specifically, we first design the Learnable cross-Modal Association (LMA) module to automatically fuse voxel-view and range-view features with image features, which fully utilize the rich semantic information of images and are robust to calibration errors. Then, the enhanced voxel-view and range-view features are transformed to the point space,where three views of point cloud features are further fused adaptively by the Learnable cross-View Association module (LVA). Notably, UniSeg achieves promising results in three public benchmarks, i.e., SemanticKITTI, nuScenes, and Waymo Open Dataset (WOD); it ranks 1st on two challenges of two benchmarks, including the LiDAR semantic segmentation challenge of nuScenes and panoptic segmentation challenges of SemanticKITTI. Besides, we construct the OpenPCSeg codebase, which is the largest and most comprehensive outdoor LiDAR segmentation codebase. It contains most of the popular outdoor LiDAR segmentation algorithms and provides reproducible implementations. The OpenPCSeg codebase will be made publicly available at https://github.com/PJLab-ADG/PCSeg.
翻译:点云、体素和距离视图是点云的三种典型表示形式。它们均具备精确的三维测量能力,但缺乏颜色和纹理信息。RGB图像作为这些点云视图的自然补充,充分利用其综合信息有助于实现更鲁棒的感知。本文提出了一种统一的多模态LiDAR分割网络UniSeg,该网络融合了RGB图像与点云三种视图的信息,并同时完成语义分割与全景分割任务。具体而言,我们首先设计了可学习的跨模态关联模块(LMA),该模块可自动将体素视图和距离视图特征与图像特征融合,充分利用图像的丰富语义信息,并对标定误差具有鲁棒性。随后,增强后的体素视图和距离视图特征被转换到点空间,在此空间通过可学习的跨视图关联模块(LVA)实现点云三种特征的适应性融合。值得注意的是,UniSeg在三个公开基准(SemanticKITTI、nuScenes和Waymo开放数据集)上取得了优异结果;在nuScenes LiDAR语义分割挑战赛和SemanticKITTI全景分割挑战赛这两项任务中排名第一。此外,我们构建了OpenPCSeg代码库,这是目前规模最大、覆盖最全面的户外LiDAR分割代码库,包含了大多数主流户外LiDAR分割算法并提供可复现的实现。OpenPCSeg代码库将于https://github.com/PJLab-ADG/PCSeg 公开发布。