Semantic Scene Completion (SSC) is pivotal in autonomous driving perception, frequently confronted with the complexities of weather and illumination changes. The long-term strategy involves fusing multi-modal information to bolster the system's robustness. Radar, increasingly utilized for 3D target detection, is gradually replacing LiDAR in autonomous driving applications, offering a robust sensing alternative. In this paper, we focus on the potential of 3D radar in semantic scene completion, pioneering cross-modal refinement techniques for improved robustness against weather and illumination changes, and enhancing SSC performance.Regarding model architecture, we propose a three-stage tight fusion approach on BEV to realize a fusion framework for point clouds and images. Based on this foundation, we designed three cross-modal distillation modules-CMRD, BRD, and PDD. Our approach enhances the performance in both radar-only (R-LiCROcc) and radar-camera (RC-LiCROcc) settings by distilling to them the rich semantic and structural information of the fused features of LiDAR and camera. Finally, our LC-Fusion (teacher model), R-LiCROcc and RC-LiCROcc achieve the best performance on the nuScenes-Occupancy dataset, with mIOU exceeding the baseline by 22.9%, 44.1%, and 15.5%, respectively. The project page is available at https://hr-zju.github.io/LiCROcc/.
翻译:语义场景补全(SSC)在自动驾驶感知中至关重要,常面临天气与光照变化的复杂挑战。长期策略涉及融合多模态信息以增强系统鲁棒性。雷达在三维目标检测中的应用日益广泛,正逐步替代自动驾驶中的LiDAR,成为一种鲁棒的感知替代方案。本文聚焦于三维雷达在语义场景补全中的潜力,开创性地提出跨模态精炼技术,以提升对天气与光照变化的鲁棒性,并增强SSC性能。在模型架构方面,我们提出了一种基于BEV的三阶段紧密融合方法,实现了点云与图像的融合框架。在此基础上,我们设计了三个跨模态蒸馏模块——CMRD、BRD与PDD。通过将LiDAR与相机融合特征的丰富语义与结构信息蒸馏至雷达模态,我们的方法在纯雷达(R-LiCROcc)与雷达-相机(RC-LiCROcc)两种设置下均提升了性能。最终,我们的LC-Fusion(教师模型)、R-LiCROcc与RC-LiCROcc在nuScenes-Occupancy数据集上取得了最佳性能,mIOU分别超越基线22.9%、44.1%与15.5%。项目页面详见 https://hr-zju.github.io/LiCROcc/。