PointDiffusion: Diffusion-Based Scene Completion in the Point Cloud Domain

Reconstructing dense 3D scenes from sparse LiDAR point clouds is a fundamental challenge in autonomous driving, where latent diffusion models offer a promising solution. However, existing approaches rely on object-level autoencoders that collapse into unstable global representations at outdoor scale and suffer from ground truth data corrupted by odometry drift that systematically degrades supervision quality. Furthermore, multi-step diffusion inference incurs prohibitive latency for real-time deployment. We propose a novel multi-token Gaussian VAE with cross-attention pooling for stable scene-scale LiDAR compression, combined with an anchor-based ICP ground truth refinement pipeline that eliminates drift-induced noise from training supervision. Together, these components enable a scaffold-free single-step diffusion completion model that achieves an approximately 16x reduction in squared Chamfer distance on SemanticKITTI seq. 08 (0.396 m^2 to 0.024 m^2), surpasses LiDiff and ScoreLiDAR by 17-19% and 10-11%, respectively, and operates at 25-143x lower inference latency. Our results demonstrate that data quality dominates model design in this regime and that multi-token latent spaces provide a stable first stage for latent diffusion-based scene completion.

翻译：从稀疏激光雷达点云重建稠密三维场景是自动驾驶领域的一项基础挑战，其中潜扩散模型提供了一种有前景的解决方案。然而，现有方法依赖的对象级自编码器在户外尺度会坍缩为不稳定的全局表征，且受制于因里程计漂移而损坏的真实数据，系统性地降低了监督质量。此外，多步扩散推理会导致实时部署面临难以承受的延迟。我们提出了一种新颖的多令牌高斯变分自编码器，结合交叉注意力池化实现稳定的场景级激光雷达压缩，并设计了一种基于锚点的迭代最近点真值优化流程，以消除训练监督中由漂移引入的噪声。这些组件共同支持了一种无需支架的单步扩散补全模型，在SemanticKITTI序列08上将平方倒角距离减少了约16倍（从0.396 m²降至0.024 m²），分别以17-19%和10-11%的优势超越LiDiff与ScoreLiDAR，且推理延迟降低25-143倍。我们的结果表明，在此场景下数据质量主导模型设计，而多令牌潜空间为基于潜扩散的场景补全提供了稳定的第一阶段基础。

相关内容

点云

关注 50

根据激光测量原理得到的点云，包括三维坐标（XYZ）和激光反射强度（Intensity）。根据摄影测量原理得到的点云，包括三维坐标（XYZ）和颜色信息（RGB）。结合激光测量和摄影测量原理得到点云，包括三维坐标（XYZ）、激光反射强度（Intensity）和颜色信息（RGB）。在获取物体表面每个采样点的空间坐标后，得到的是一个点的集合，称之为“点云”(Point Cloud)

稀疏点云感知的表示学习

专知会员服务

9+阅读 · 2月9日

【AAAI2026】《SimDiff：用于时间序列点预测的更简单但更优的扩散模型》

专知会员服务

14+阅读 · 2025年11月25日

【CVPR2025】场景飞溅：基于视频扩散模型的单图像动势三维场景生成

专知会员服务

9+阅读 · 2025年4月4日

【博士论文】迈向可扩展、灵活的点云场景流

专知会员服务

14+阅读 · 2025年3月21日