Identifying moving objects is an essential capability for autonomous systems, as it provides critical information for pose estimation, navigation, collision avoidance, and static map construction. In this paper, we present MotionBEV, a fast and accurate framework for LiDAR moving object segmentation, which segments moving objects with appearance and motion features in the bird's eye view (BEV) domain. Our approach converts 3D LiDAR scans into a 2D polar BEV representation to improve computational efficiency. Specifically, we learn appearance features with a simplified PointNet and compute motion features through the height differences of consecutive frames of point clouds projected onto vertical columns in the polar BEV coordinate system. We employ a dual-branch network bridged by the Appearance-Motion Co-attention Module (AMCM) to adaptively fuse the spatio-temporal information from appearance and motion features. Our approach achieves state-of-the-art performance on the SemanticKITTI-MOS benchmark. Furthermore, to demonstrate the practical effectiveness of our method, we provide a LiDAR-MOS dataset recorded by a solid-state LiDAR, which features non-repetitive scanning patterns and a small field of view.
翻译:识别运动物体是自主系统的关键能力,它为位姿估计、导航、避障和静态地图构建提供重要信息。本文提出MotionBEV,一个快速且精确的激光雷达运动物体分割框架,该框架在鸟瞰视图域中利用外观与运动特征对运动物体进行分割。我们的方法将3D激光雷达扫描转换为2D极坐标鸟瞰图表示,以提高计算效率。具体而言,我们通过简化PointNet学习外观特征,并利用连续帧点云在极坐标鸟瞰坐标系中投影至垂直柱体的高度差计算运动特征。我们采用由外观-运动协同注意力模块桥接的双分支网络,自适应融合来自外观和运动特征的时空信息。该方法在SemanticKITTI-MOS基准上实现了最先进的性能。此外,为验证方法的实际有效性,我们提供了由固态激光雷达记录的LiDAR-MOS数据集,该数据集具有非重复扫描模式和小视场角的特点。