LiDAR-based semantic perception tasks are critical yet challenging for autonomous driving. Due to the motion of objects and static/dynamic occlusion, temporal information plays an essential role in reinforcing perception by enhancing and completing single-frame knowledge. Previous approaches either directly stack historical frames to the current frame or build a 4D spatio-temporal neighborhood using KNN, which duplicates computation and hinders realtime performance. Based on our observation that stacking all the historical points would damage performance due to a large amount of redundant and misleading information, we propose the Sparse Voxel-Adjacent Query Network (SVQNet) for 4D LiDAR semantic segmentation. To take full advantage of the historical frames high-efficiently, we shunt the historical points into two groups with reference to the current points. One is the Voxel-Adjacent Neighborhood carrying local enhancing knowledge. The other is the Historical Context completing the global knowledge. Then we propose new modules to select and extract the instructive features from the two groups. Our SVQNet achieves state-of-the-art performance in LiDAR semantic segmentation of the SemanticKITTI benchmark and the nuScenes dataset.
翻译:基于激光雷达的语义感知任务对自动驾驶至关重要且极具挑战性。由于物体运动及静态/动态遮挡,时序信息通过增强和补全单帧知识在强化感知中发挥着重要作用。现有方法要么将历史帧直接堆叠至当前帧,要么通过K近邻构建4D时空邻域,但这会带来重复计算并阻碍实时性能。基于"堆叠所有历史点因包含大量冗余和误导信息会损害性能"的观察,我们提出用于4D激光雷达语义分割的稀疏体素邻接查询网络(SVQNet)。为高效充分利用历史帧,我们以当前点为基准将历史点分为两组:携带局部增强知识的体素邻接邻域,以及补全全局知识的历史上下文。随后提出新模块从这两组中选择性提取指导性特征。SVQNet在SemanticKITTI基准和nuScenes数据集的激光雷达语义分割任务中均达到最先进性能。