LiDAR scene synthesis is an emerging solution to scarcity in 3D data for robotic tasks such as autonomous driving. Recent approaches employ diffusion or flow matching models to generate realistic scenes, but 3D data remains limited compared to RGB datasets with millions of samples. We introduce R3DPA, the first LiDAR scene generation method to unlock image-pretrained priors for LiDAR point clouds, and leverage self-supervised 3D representations for state-of-the-art results. Specifically, we (i) align intermediate features of our generative model with self-supervised 3D features, which substantially improves generation quality; (ii) transfer knowledge from large-scale image-pretrained generative models to LiDAR generation, mitigating limited LiDAR datasets; and (iii) enable point cloud control at inference for object inpainting and scene mixing with solely an unconditional model. On the KITTI-360 benchmark R3DPA achieves state of the art performance. Code and pretrained models are available at https://github.com/valeoai/R3DPA.
翻译:激光雷达场景合成是解决自动驾驶等机器人任务中三维数据稀缺问题的新兴方案。现有方法采用扩散模型或流匹配模型生成逼真场景,但与拥有数百万样本的RGB数据集相比,三维数据仍然有限。我们提出了R3DPA——首个解锁图像预训练先验用于激光雷达点云生成的场景生成方法,并利用自监督三维表征实现了最先进的性能。具体而言,我们(i)将生成模型的中间特征与自监督三维特征对齐,显著提升了生成质量;(ii)将大规模图像预训练生成模型的知识迁移至激光雷达生成任务,缓解了激光雷达数据集规模有限的问题;(iii)在推理阶段仅通过无条件模型即可实现点云控制,包括目标修复和场景混合。在KITTI-360基准测试中,R3DPA取得了最先进的性能。代码与预训练模型发布于https://github.com/valeoai/R3DPA。