Identifying moving objects is a crucial capability for autonomous navigation, consistent map generation, and future trajectory prediction of objects. In this paper, we propose a novel network that addresses the challenge of segmenting moving objects in 3D LiDAR scans. Our approach not only predicts point-wise moving labels but also detects instance information of main traffic participants. Such a design helps determine which instances are actually moving and which ones are temporarily static in the current scene. Our method exploits a sequence of point clouds as input and quantifies them into 4D voxels. We use 4D sparse convolutions to extract motion features from the 4D voxels and inject them into the current scan. Then, we extract spatio-temporal features from the current scan for instance detection and feature fusion. Finally, we design an upsample fusion module to output point-wise labels by fusing the spatio-temporal features and predicted instance information. We evaluated our approach on the LiDAR-MOS benchmark based on SemanticKITTI and achieved better moving object segmentation performance compared to state-of-the-art methods, demonstrating the effectiveness of our approach in integrating instance information for moving object segmentation. Furthermore, our method shows superior performance on the Apollo dataset with a pre-trained model on SemanticKITTI, indicating that our method generalizes well in different scenes.The code and pre-trained models of our method will be released at https://github.com/nubot-nudt/InsMOS.
翻译:识别运动目标是实现自主导航、一致性地图生成以及目标未来轨迹预测的关键能力。本文提出了一种新颖的网络架构,用于解决三维激光雷达扫描中运动目标分割的挑战。该方法不仅能预测逐点运动标签,还可检测主要交通参与者的实例信息。这种设计有助于确定当前场景中哪些实例真正处于运动状态,哪些实例暂时静止。本方法以连续点云序列为输入,将其量化为四维体素,通过四维稀疏卷积提取运动特征并将其注入当前扫描帧中。随后,从当前扫描帧中提取时空特征用于实例检测与特征融合。最终,我们设计了一个上采样融合模块,通过融合时空特征与预测的实例信息输出逐点标签。我们在基于SemanticKITTI的LiDAR-MOS基准上评估了该方法,相比现有最优方法取得了更优的运动目标分割性能,验证了将实例信息融合进运动目标分割的有效性。此外,本方法在Apollo数据集上使用SemanticKITTI预训练模型展现了卓越性能,表明其在不同场景下具有良好的泛化能力。本方法的代码与预训练模型将发布于https://github.com/nubot-nudt/InsMOS。