3D multi-object tracking and trajectory prediction are two crucial modules in autonomous driving systems. Generally, the two tasks are handled separately in traditional paradigms and a few methods have started to explore modeling these two tasks in a joint manner recently. However, these approaches suffer from the limitations of single-frame training and inconsistent coordinate representations between tracking and prediction tasks. In this paper, we propose a streaming and unified framework for joint 3D Multi-Object Tracking and trajectory Prediction (StreamMOTP) to address the above challenges. Firstly, we construct the model in a streaming manner and exploit a memory bank to preserve and leverage the long-term latent features for tracked objects more effectively. Secondly, a relative spatio-temporal positional encoding strategy is introduced to bridge the gap of coordinate representations between the two tasks and maintain the pose-invariance for trajectory prediction. Thirdly, we further improve the quality and consistency of predicted trajectories with a dual-stream predictor. We conduct extensive experiments on popular nuSences dataset and the experimental results demonstrate the effectiveness and superiority of StreamMOTP, which outperforms previous methods significantly on both tasks. Furthermore, we also prove that the proposed framework has great potential and advantages in actual applications of autonomous driving.
翻译:3D多目标跟踪与轨迹预测是自动驾驶系统中的两个关键模块。在传统范式中,这两个任务通常被分开处理,近期少数方法开始探索以联合方式对这两个任务进行建模。然而,这些方法存在单帧训练的限制,且跟踪与预测任务间的坐标表示不一致。本文提出一种用于联合3D多目标跟踪与轨迹预测的流式统一框架,以应对上述挑战。首先,我们以流式方式构建模型,并利用记忆库来更有效地保存和利用被跟踪对象的长期潜在特征。其次,引入一种相对时空位置编码策略,以弥合两个任务间坐标表示的差距,并保持轨迹预测的位姿不变性。第三,我们通过一个双流预测器进一步提升了预测轨迹的质量与一致性。我们在流行的nuScenes数据集上进行了大量实验,结果表明StreamMOTP的有效性与优越性,其在两项任务上的性能均显著优于先前方法。此外,我们还证明了所提框架在自动驾驶实际应用中具有巨大潜力与优势。