This work proposes an end-to-end multi-camera 3D multi-object tracking (MOT) framework. It emphasizes spatio-temporal continuity and integrates both past and future reasoning for tracked objects. Thus, we name it "Past-and-Future reasoning for Tracking" (PF-Track). Specifically, our method adapts the "tracking by attention" framework and represents tracked instances coherently over time with object queries. To explicitly use historical cues, our "Past Reasoning" module learns to refine the tracks and enhance the object features by cross-attending to queries from previous frames and other objects. The "Future Reasoning" module digests historical information and predicts robust future trajectories. In the case of long-term occlusions, our method maintains the object positions and enables re-association by integrating motion predictions. On the nuScenes dataset, our method improves AMOTA by a large margin and remarkably reduces ID-Switches by 90% compared to prior approaches, which is an order of magnitude less. The code and models are made available at https://github.com/TRI-ML/PF-Track.
翻译:本文提出了一种端到端的多摄像头3D多目标跟踪(MOT)框架。该框架强调时空连续性,并整合了对被跟踪目标的过去与未来推理。因此,我们将其命名为“面向跟踪的过去与未来推理”(PF-Track)。具体而言,我们的方法采用“基于注意力机制的跟踪”框架,并通过目标查询(object queries)在时间上连贯地表示被跟踪实例。为显式利用历史线索,我们的“过去推理”模块通过交叉关注前一帧及其他目标的目标查询,学习优化轨迹并增强目标特征。“未来推理”模块则消化历史信息并预测稳健的未来轨迹。在长期遮挡情形下,我们的方法通过整合运动预测来维持目标位置并实现重新关联。在nuScenes数据集上,与先前方法相比,我们的方法大幅度提升了AMOTA指标,并将身份跳变(ID-Switches)显著减少90%,降幅达一个数量级。代码与模型已开源至https://github.com/TRI-ML/PF-Track。