Computer vision techniques play a central role in the perception stack of autonomous vehicles. Such methods are employed to perceive the vehicle surroundings given sensor data. 3D LiDAR sensors are commonly used to collect sparse 3D point clouds from the scene. However, compared to human perception, such systems struggle to deduce the unseen parts of the scene given those sparse point clouds. In this matter, the scene completion task aims at predicting the gaps in the LiDAR measurements to achieve a more complete scene representation. Given the promising results of recent diffusion models as generative models for images, we propose extending them to achieve scene completion from a single 3D LiDAR scan. Previous works used diffusion models over range images extracted from LiDAR data, directly applying image-based diffusion methods. Distinctly, we propose to directly operate on the points, reformulating the noising and denoising diffusion process such that it can efficiently work at scene scale. Together with our approach, we propose a regularization loss to stabilize the noise predicted during the denoising process. Our experimental evaluation shows that our method can complete the scene given a single LiDAR scan as input, producing a scene with more details compared to state-of-the-art scene completion methods. We believe that our proposed diffusion process formulation can support further research in diffusion models applied to scene-scale point cloud data.
翻译:计算机视觉技术在自动驾驶汽车的感知系统中扮演核心角色,此类方法用于根据传感器数据感知车辆周围环境。3D LiDAR传感器常被用于从场景中采集稀疏的三维点云。然而,与人类感知相比,此类系统难以基于这些稀疏点云推断场景中不可见的部分。为此,场景补全任务旨在预测LiDAR测量中的缺失区域,以获取更完整的场景表示。鉴于近期扩散模型在图像生成方面取得的显著成果,我们提出将其扩展到基于单次3D LiDAR扫描实现场景补全。以往工作对LiDAR数据提取的距离图像应用扩散模型,直接采用基于图像的扩散方法。与之不同,我们提出直接对点云进行操作,重新表述加噪和去噪扩散过程,使其能在场景尺度上高效运行。结合我们的方法,我们还提出一种正则化损失,用于稳定去噪过程中预测的噪声。实验评估表明,我们的方法能够以单次LiDAR扫描为输入完成场景补全,且产生的场景比现有最先进场景补全方法包含更丰富的细节。我们相信,所提出的扩散过程表述可支持针对场景尺度点云数据的扩散模型进一步研究。