Identifying moving objects is an essential capability for autonomous systems, as it provides critical information for pose estimation, navigation, collision avoidance, and static map construction. In this paper, we present MotionBEV, a fast and accurate framework for LiDAR moving object segmentation, which segments moving objects with appearance and motion features in the bird's eye view (BEV) domain. Our approach converts 3D LiDAR scans into a 2D polar BEV representation to improve computational efficiency. Specifically, we learn appearance features with a simplified PointNet and compute motion features through the height differences of consecutive frames of point clouds projected onto vertical columns in the polar BEV coordinate system. We employ a dual-branch network bridged by the Appearance-Motion Co-attention Module (AMCM) to adaptively fuse the spatio-temporal information from appearance and motion features. Our approach achieves state-of-the-art performance on the SemanticKITTI-MOS benchmark. Furthermore, to demonstrate the practical effectiveness of our method, we provide a LiDAR-MOS dataset recorded by a solid-state LiDAR, which features non-repetitive scanning patterns and a small field of view.
翻译:识别运动物体是自主系统的关键能力之一,因为它为位姿估计、导航、避障和静态地图构建提供了重要信息。本文提出MotionBEV——一个快速且精确的激光雷达运动物体分割框架,通过在鸟瞰图域中结合外观与运动特征实现运动物体分割。我们将3D激光雷达扫描转换为2D极坐标鸟瞰图表示,以提高计算效率。具体而言,我们采用简化PointNet学习外观特征,并通过连续帧点云投影到极坐标鸟瞰图坐标系中垂直柱状体素的高度差计算运动特征。我们设计了由外观-运动协同注意力模块(AMCM)连接的双分支网络,自适应融合外观与运动特征的时空信息。在SemanticKITTI-MOS基准上,我们的方法达到了最先进性能。此外,为验证方法的实际有效性,我们提供了一个由固态激光雷达记录的数据集LiDAR-MOS,该数据集具有非重复扫描模式和小视场角的特点。