The perception of motion behavior in a dynamic environment holds significant importance for autonomous driving systems, wherein class-agnostic motion prediction methods directly predict the motion of the entire point cloud. While most existing methods rely on fully-supervised learning, the manual labeling of point cloud data is laborious and time-consuming. Therefore, several annotation-efficient methods have been proposed to address this challenge. Although effective, these methods rely on weak annotations or additional multi-modal data like images, and the potential benefits inherent in the point cloud sequence are still underexplored. To this end, we explore the feasibility of self-supervised motion prediction with only unlabeled LiDAR point clouds. Initially, we employ an optimal transport solver to establish coarse correspondences between current and future point clouds as the coarse pseudo motion labels. Training models directly using such coarse labels leads to noticeable spatial and temporal prediction inconsistencies. To mitigate these issues, we introduce three simple spatial and temporal regularization losses, which facilitate the self-supervised training process effectively. Experimental results demonstrate the significant superiority of our approach over the state-of-the-art self-supervised methods.
翻译:动态环境下的运动行为感知对于自动驾驶系统至关重要,其中类无关运动预测方法可直接预测整个点云的运动。现有方法大多依赖全监督学习,但点云数据的人工标注费时费力。因此,已有多项标注高效方法被提出以应对这一挑战。尽管这些方法有效,但它们依赖弱标注或图像等额外多模态数据,且点云序列中蕴含的潜在优势尚未被充分挖掘。为此,我们探索仅利用无标注激光雷达点云进行自监督运动预测的可行性。首先,采用最优传输求解器建立当前与未来点云之间的粗略对应关系作为粗粒度伪运动标签。直接使用此类粗标签训练模型会导致显著的时空预测不一致性。为缓解该问题,我们引入三种简单的时空正则化损失函数,有效促进了自监督训练过程。实验结果表明,我们的方法显著优于现有最优的自监督方法。