Recent temporal LiDAR-based 3D object detectors achieve promising performance based on the two-stage proposal-based approach. They generate 3D box candidates from the first-stage dense detector, followed by different temporal aggregation methods. However, these approaches require per-frame objects or whole point clouds, posing challenges related to memory bank utilization. Moreover, point clouds and trajectory features are combined solely based on concatenation, which may neglect effective interactions between them. In this paper, we propose a point-trajectory transformer with long short-term memory for efficient temporal 3D object detection. To this end, we only utilize point clouds of current-frame objects and their historical trajectories as input to minimize the memory bank storage requirement. Furthermore, we introduce modules to encode trajectory features, focusing on long short-term and future-aware perspectives, and then effectively aggregate them with point cloud features. We conduct extensive experiments on the large-scale Waymo dataset to demonstrate that our approach performs well against state-of-the-art methods. Code and models will be made publicly available at https://github.com/kuanchihhuang/PTT.
翻译:近期基于时序激光雷达的三维目标检测器采用两阶段提议法取得了良好的性能:首阶段密集检测器生成三维候选框,后续通过不同的时序聚合方法处理。然而,这些方法需要逐帧目标或完整点云,导致内存库利用存在挑战。此外,点云与轨迹特征的融合仅基于拼接操作,可能忽略了二者间的有效交互。本文提出一种集成长短时记忆机制的点-轨迹Transformer,用于高效时序三维目标检测。为此,我们仅使用当前帧目标点云及其历史轨迹作为输入,以最小化内存库存储需求。进一步引入模块编码轨迹特征,聚焦长短时序与未来感知视角,并将其与点云特征有效聚合。在大型Waymo数据集上的大量实验表明,本方法性能优于现有最优技术。代码与模型将在 https://github.com/kuanchihhuang/PTT 公开。