Moving object segmentation based on LiDAR is a crucial and challenging task for autonomous driving and mobile robotics. Most approaches explore spatio-temporal information from LiDAR sequences to predict moving objects in the current frame. However, they often focus on transferring temporal cues in a single inference and regard every prediction as independent of others. This may cause inconsistent segmentation results for the same object in different frames. To overcome this issue, we propose a streaming network with a memory mechanism, called StreamMOS, to build the association of features and predictions among multiple inferences. Specifically, we utilize a short-term memory to convey historical features, which can be regarded as spatial prior of moving objects and adopted to enhance current inference by temporal fusion. Meanwhile, we build a long-term memory to store previous predictions and exploit them to refine the present forecast at voxel and instance levels through voting. Besides, we present multi-view encoder with cascade projection and asymmetric convolution to extract motion feature of objects in different representations. Extensive experiments validate that our algorithm gets competitive performance on SemanticKITTI and Sipailou Campus datasets. Code will be released at https://github.com/NEU-REAL/StreamMOS.git.
翻译:基于激光雷达的运动目标分割是自动驾驶与移动机器人领域的关键性挑战任务。现有方法大多从激光雷达序列中挖掘时空信息以预测当前帧中的运动目标,然而它们通常侧重于在单次推理中传递时序线索,并将每次预测视为相互独立的过程。这可能导致同一目标在不同帧中出现分割结果不一致的问题。为解决该问题,本文提出一种具有记忆机制的流式网络StreamMOS,旨在建立多次推理间的特征与预测关联。具体而言,我们利用短期记忆传递历史特征,将其视为运动目标的空间先验,并通过时序融合增强当前推理。同时,我们构建长期记忆存储历史预测,通过体素级与实例级投票机制优化当前预测结果。此外,我们提出具有级联投影与非对称卷积的多视角编码器,以提取不同表征形式下的目标运动特征。大量实验验证了本算法在SemanticKITTI与Sipailou Campus数据集上具有竞争力的性能。代码将在https://github.com/NEU-REAL/StreamMOS.git发布。