3D multi-object tracking (MOT) is vital for many applications including autonomous driving vehicles and service robots. With the commonly used tracking-by-detection paradigm, 3D MOT has made important progress in recent years. However, these methods only use the detection boxes of the current frame to obtain trajectory-box association results, which makes it impossible for the tracker to recover objects missed by the detector. In this paper, we present TrajectoryFormer, a novel point-cloud-based 3D MOT framework. To recover the missed object by detector, we generates multiple trajectory hypotheses with hybrid candidate boxes, including temporally predicted boxes and current-frame detection boxes, for trajectory-box association. The predicted boxes can propagate object's history trajectory information to the current frame and thus the network can tolerate short-term miss detection of the tracked objects. We combine long-term object motion feature and short-term object appearance feature to create per-hypothesis feature embedding, which reduces the computational overhead for spatial-temporal encoding. Additionally, we introduce a Global-Local Interaction Module to conduct information interaction among all hypotheses and models their spatial relations, leading to accurate estimation of hypotheses. Our TrajectoryFormer achieves state-of-the-art performance on the Waymo 3D MOT benchmarks. Code is available at https://github.com/poodarchu/EFG .
翻译:三维多目标跟踪(3D MOT)对于包括自动驾驶车辆和服务机器人在内的众多应用至关重要。借助常用的检测-跟踪范式,近年来3D MOT取得了重要进展。然而,这些方法仅利用当前帧的检测框来获取轨迹-检测框关联结果,这导致跟踪器无法恢复被检测器遗漏的目标。本文提出TrajectoryFormer,一种基于点云的新型3D MOT框架。为恢复检测器遗漏的目标,我们通过混合候选框(包括时间预测框和当前帧检测框)生成多个轨迹假设,用于轨迹-框关联。预测框可将目标的过往轨迹信息传播至当前帧,从而使网络能够容忍被跟踪目标的短期漏检。我们结合长期目标运动特征与短期目标外观特征,构建每个假设的特征嵌入,从而降低时空编码的计算开销。此外,我们引入全局-局部交互模块,以在所有假设间进行信息交互并建模其空间关系,进而实现假设的精准估计。我们的TrajectoryFormer在Waymo 3D MOT基准测试中取得了最先进的性能。代码开源地址:https://github.com/poodarchu/EFG。