3D panoptic segmentation is a challenging perception task that requires both semantic segmentation and instance segmentation. In this task, we notice that images could provide rich texture, color, and discriminative information, which can complement LiDAR data for evident performance improvement, but their fusion remains a challenging problem. To this end, we propose LCPS, the first LiDAR-Camera Panoptic Segmentation network. In our approach, we conduct LiDAR-Camera fusion in three stages: 1) an Asynchronous Compensation Pixel Alignment (ACPA) module that calibrates the coordinate misalignment caused by asynchronous problems between sensors; 2) a Semantic-Aware Region Alignment (SARA) module that extends the one-to-one point-pixel mapping to one-to-many semantic relations; 3) a Point-to-Voxel feature Propagation (PVP) module that integrates both geometric and semantic fusion information for the entire point cloud. Our fusion strategy improves about 6.9% PQ performance over the LiDAR-only baseline on NuScenes dataset. Extensive quantitative and qualitative experiments further demonstrate the effectiveness of our novel framework. The code will be released at https://github.com/zhangzw12319/lcps.git.
翻译:三维全景分割是一项具有挑战性的感知任务,要求同时实现语义分割和实例分割。在该任务中,我们发现图像能够提供丰富的纹理、色彩和判别性信息,可有效补充激光雷达数据以实现显著性能提升,但其融合问题仍具挑战性。为此,我们提出LCPS——首个激光雷达-相机全景分割网络。该方法通过三个阶段实现激光雷达-相机融合:1)异步补偿像素对齐(ACPA)模块,用于校准传感器间异步问题导致的坐标错位;2)语义感知区域对齐(SARA)模块,将一对一像素映射扩展为多对多语义关系;3)点-体素特征传播(PVP)模块,集成几何与语义融合信息以处理完整点云。在NuScenes数据集上,我们的融合策略相较于仅使用激光雷达的基线方法提升了约6.9%的PQ性能。大量定量与定性实验进一步验证了该新型框架的有效性。代码将发布于https://github.com/zhangzw12319/lcps.git。