Identifying moving objects is an essential capability for autonomous systems, as it provides critical information for pose estimation, navigation, collision avoidance and static map construction. In this paper, we present MotionBEV, a fast and accurate framework for LiDAR moving object segmentation, which segments moving objects with appearance and motion features in bird's eye view (BEV) domain. Our approach converts 3D LiDAR scans into 2D polar BEV representation to achieve real-time performance. Specifically, we learn appearance features with a simplified PointNet, and compute motion features through the height differences of consecutive frames of point clouds projected onto vertical columns in the polar BEV coordinate system. We employ a dual-branch network bridged by the Appearance-Motion Co-attention Module (AMCM) to adaptively fuse the spatio-temporal information from appearance and motion features. Our approach achieves state-of-the-art performance on the SemanticKITTI-MOS benchmark, with an average inference time of 23ms on an RTX 3090 GPU. Furthermore, to demonstrate the practical effectiveness of our method, we provide a LiDAR-MOS dataset recorded by a solid-state LiDAR, which features non-repetitive scanning patterns and small field of view.
翻译:识别运动目标是自主系统的关键能力,因为它能为位姿估计、导航、避障及静态地图构建提供重要信息。本文提出MotionBEV——一种快速准确的激光雷达运动目标分割框架,该框架在鸟瞰图域中利用外观特征与运动特征实现运动目标分割。我们的方法将三维激光雷达扫描转换为二维极坐标鸟瞰图表示,以实现实时性能。具体地,我们采用简化PointNet学习外观特征,并通过极坐标鸟瞰图坐标系中垂直柱体上的连续点云帧高度差计算运动特征。我们采用由外观-运动协同注意力模块(AMCM)连接的双分支网络,自适应融合来自外观与运动特征的时空信息。本方法在SemanticKITTI-MOS基准上达到最优性能,在RTX 3090 GPU上平均推理时间为23毫秒。此外,为展示方法的实际有效性,我们提供了由固态激光雷达记录的LiDAR-MOS数据集,该数据集具有非重复扫描模式与小视场角的特点。