In this paper, we propose a novel self-supervised motion estimator for LiDAR-based autonomous driving via BEV representation. Different from usually adopted self-supervised strategies for data-level structure consistency, we predict scene motion via feature-level consistency between pillars in consecutive frames, which can eliminate the effect caused by noise points and view-changing point clouds in dynamic scenes. Specifically, we propose \textit{Soft Discriminative Loss} that provides the network with more pseudo-supervised signals to learn discriminative and robust features in a contrastive learning manner. We also propose \textit{Gated Multi-frame Fusion} block that learns valid compensation between point cloud frames automatically to enhance feature extraction. Finally, \textit{pillar association} is proposed to predict pillar correspondence probabilities based on feature distance, and whereby further predicts scene motion. Extensive experiments show the effectiveness and superiority of our \textbf{ContrastMotion} on both scene flow and motion prediction tasks. The code is available soon.
翻译:本文提出一种基于BEV表示的新型自监督运动估计方法,用于激光雷达自动驾驶场景。与常规采用数据级结构一致性的自监督策略不同,我们通过连续帧间柱体特征级一致性预测场景运动,可有效消除动态场景中噪声点及视角变化点云带来的影响。具体而言,我们提出**软判别损失函数**,以对比学习方式为网络提供更多伪监督信号,从而学习判别性强且鲁棒的特征;同时设计**门控多帧融合模块**,通过自动学习点云帧间有效补偿来增强特征提取。最后,提出**柱体关联**方法,基于特征距离预测柱体对应概率,进而实现场景运动预测。大量实验证明,我们的**ContrastMotion**在场景流和运动预测任务中均具有显著有效性与优越性。代码即将开源。