This work proposes an end-to-end multi-camera 3D multi-object tracking (MOT) framework. It emphasizes spatio-temporal continuity and integrates both past and future reasoning for tracked objects. Thus, we name it "Past-and-Future reasoning for Tracking" (PF-Track). Specifically, our method adapts the "tracking by attention" framework and represents tracked instances coherently over time with object queries. To explicitly use historical cues, our "Past Reasoning" module learns to refine the tracks and enhance the object features by cross-attending to queries from previous frames and other objects. The "Future Reasoning" module digests historical information and predicts robust future trajectories. In the case of long-term occlusions, our method maintains the object positions and enables re-association by integrating motion predictions. On the nuScenes dataset, our method improves AMOTA by a large margin and remarkably reduces ID-Switches by 90% compared to prior approaches, which is an order of magnitude less. The code and models are made available at https://github.com/TRI-ML/PF-Track.
翻译:本工作提出了一种端到端的多相机三维多目标跟踪(MOT)框架。该框架强调时空连续性,并整合了对被跟踪目标的过去与未来推理。因此,我们将其命名为“过去与未来推理跟踪”(PF-Track)。具体而言,我们的方法采用了“基于注意力的跟踪”框架,并通过目标查询在时间上一致地表示被跟踪实例。为了显式利用历史线索,我们的“过去推理”模块通过交叉关注前一帧的查询及其他目标,学习细化轨迹并增强目标特征。“未来推理”模块则消化历史信息并预测鲁棒的未来轨迹。在长期遮挡情况下,我们的方法通过整合运动预测来维持目标位置并实现重新关联。在nuScenes数据集上,我们的方法大幅提升了平均多目标跟踪精度(AMOTA),并将身份切换(ID-Switches)相比先前方法减少90%,降低了一个数量级。代码和模型已开源至 https://github.com/TRI-ML/PF-Track。