3D object detection from LiDAR point cloud is of critical importance for autonomous driving and robotics. While sequential point cloud has the potential to enhance 3D perception through temporal information, utilizing these temporal features effectively and efficiently remains a challenging problem. Based on the observation that the foreground information is sparsely distributed in LiDAR scenes, we believe sufficient knowledge can be provided by sparse format rather than dense maps. To this end, we propose to learn Significance-gUided Information for 3D Temporal detection (SUIT), which simplifies temporal information as sparse features for information fusion across frames. Specifically, we first introduce a significant sampling mechanism that extracts information-rich yet sparse features based on predicted object centroids. On top of that, we present an explicit geometric transformation learning technique, which learns the object-centric transformations among sparse features across frames. We evaluate our method on large-scale nuScenes and Waymo dataset, where our SUIT not only significantly reduces the memory and computation cost of temporal fusion, but also performs well over the state-of-the-art baselines.
翻译:基于激光雷达点云的三维物体检测对自动驾驶和机器人技术至关重要。尽管时序点云能够通过时间信息增强三维感知能力,但如何高效利用这些时序特征仍是一个具有挑战性的问题。基于对激光雷达场景中前景信息稀疏分布的观察,我们认为稀疏格式而非密集地图即可提供足够的信息。为此,我们提出学习显著性引导信息的三维时序检测方法(SUIT),该方法将时序信息简化为稀疏特征以实现跨帧信息融合。具体而言,我们首先引入一种基于预测物体中心点提取信息丰富且稀疏特征的显著性采样机制。在此基础上,提出显式几何变换学习技术,用于学习跨帧稀疏特征间的物体中心变换。我们在大规模nuScenes和Waymo数据集上评估了该方法,SUIT不仅显著降低了时序融合的内存与计算成本,并且性能优于现有最优基线方法。