Diffusion models have been applied to 3D LiDAR scene completion due to their strong training stability and high completion quality. However, the slow sampling speed limits the practical application of diffusion-based scene completion models since autonomous vehicles require an efficient perception of surrounding environments. This paper proposes a novel distillation method tailored for 3D LiDAR scene completion models, dubbed $\textbf{ScoreLiDAR}$, which achieves efficient yet high-quality scene completion. ScoreLiDAR enables the distilled model to sample in significantly fewer steps after distillation. To improve completion quality, we also introduce a novel $\textbf{Structural Loss}$, which encourages the distilled model to capture the geometric structure of the 3D LiDAR scene. The loss contains a scene-wise term constraining the holistic structure and a point-wise term constraining the key landmark points and their relative configuration. Extensive experiments demonstrate that ScoreLiDAR significantly accelerates the completion time from 30.55 to 5.37 seconds per frame ($>$5$\times$) on SemanticKITTI and achieves superior performance compared to state-of-the-art 3D LiDAR scene completion models. Our code is publicly available at https://github.com/happyw1nd/ScoreLiDAR.
翻译:扩散模型因其强大的训练稳定性和高补全质量而被应用于三维激光雷达场景补全。然而,由于自动驾驶车辆需要对周围环境进行高效感知,缓慢的采样速度限制了基于扩散的场景补全模型的实际应用。本文提出了一种专为三维激光雷达场景补全模型设计的新型蒸馏方法,称为 $\textbf{ScoreLiDAR}$,该方法实现了高效且高质量的場景补全。ScoreLiDAR 使蒸馏后的模型能够在显著更少的步骤中进行采样。为了提高补全质量,我们还引入了一种新颖的 $\textbf{结构损失}$,该损失鼓励蒸馏模型捕捉三维激光雷达场景的几何结构。该损失包含一个约束整体结构的场景级项和一个约束关键地标点及其相对配置的点级项。大量实验表明,ScoreLiDAR 在 SemanticKITTI 数据集上将每帧补全时间从 30.55 秒显著加速至 5.37 秒($>$5$\times$),并且与最先进的三维激光雷达场景补全模型相比,实现了更优的性能。我们的代码公开在 https://github.com/happyw1nd/ScoreLiDAR。