Learning 3D scene flow from LiDAR point clouds presents significant difficulties, including poor generalization from synthetic datasets to real scenes, scarcity of real-world 3D labels, and poor performance on real sparse LiDAR point clouds. We present a novel approach from the perspective of auto-labelling, aiming to generate a large number of 3D scene flow pseudo labels for real-world LiDAR point clouds. Specifically, we employ the assumption of rigid body motion to simulate potential object-level rigid movements in autonomous driving scenarios. By updating different motion attributes for multiple anchor boxes, the rigid motion decomposition is obtained for the whole scene. Furthermore, we developed a novel 3D scene flow data augmentation method for global and local motion. By perfectly synthesizing target point clouds based on augmented motion parameters, we easily obtain lots of 3D scene flow labels in point clouds highly consistent with real scenarios. On multiple real-world datasets including LiDAR KITTI, nuScenes, and Argoverse, our method outperforms all previous supervised and unsupervised methods without requiring manual labelling. Impressively, our method achieves a tenfold reduction in EPE3D metric on the LiDAR KITTI dataset, reducing it from $0.190m$ to a mere $0.008m$ error.
翻译:从激光雷达点云中学习三维场景流面临显著困难,包括从合成数据集到真实场景的泛化能力差、真实世界三维标签稀缺,以及在稀疏真实激光雷达点云上性能不佳。我们提出了一种基于自动标注视角的新方法,旨在为真实世界的激光雷达点云生成大量三维场景流伪标签。具体而言,我们利用刚体运动假设来模拟自动驾驶场景中潜在的物体级刚性运动。通过为多个锚框更新不同的运动属性,获得整个场景的刚性运动分解。此外,我们开发了一种新颖的三维场景流数据增强方法,涵盖全局和局部运动。通过基于增强运动参数完美合成目标点云,我们轻松获得了大量与真实场景高度一致的点云三维场景流标签。在包含LiDAR KITTI、nuScenes和Argoverse的多个真实数据集上,我们的方法在无需人工标注的情况下超越了所有先前有监督和无监督方法。令人瞩目的是,我们的方法在LiDAR KITTI数据集上实现了EPE3D指标的十倍降低,将其误差从$0.190m$降至仅$0.008m$。