Many Multi-Object Tracking (MOT) approaches exploit motion information to associate all the detected objects across frames. However, many methods that rely on filtering-based algorithms, such as the Kalman Filter, often work well in linear motion scenarios but struggle to accurately predict the locations of objects undergoing complex and non-linear movements. To tackle these scenarios, we propose a motion-based MOT approach with an enhanced temporal motion predictor, ETTrack. Specifically, the motion predictor integrates a transformer model and a Temporal Convolutional Network (TCN) to capture short-term and long-term motion patterns, and it predicts the future motion of individual objects based on the historical motion information. Additionally, we propose a novel Momentum Correction Loss function that provides additional information regarding the motion direction of objects during training. This allows the motion predictor rapidly adapt to motion variations and more accurately predict future motion. Our experimental results demonstrate that ETTrack achieves a competitive performance compared with state-of-the-art trackers on DanceTrack and SportsMOT, scoring 56.4% and 74.4% in HOTA metrics, respectively.
翻译:许多多目标跟踪方法利用运动信息来关联跨帧检测到的所有目标。然而,许多依赖基于滤波算法的方法,例如卡尔曼滤波器,通常在线性运动场景中表现良好,但难以准确预测经历复杂非线性运动的目标位置。为应对这些场景,我们提出了一种基于运动的多目标跟踪方法,其配备了一个增强的时序运动预测器——ETTrack。具体而言,该运动预测器集成了一个Transformer模型和一个时序卷积网络,以捕捉短期和长期运动模式,并基于历史运动信息预测单个目标的未来运动。此外,我们提出了一种新颖的动量校正损失函数,该函数在训练过程中提供了关于目标运动方向的额外信息。这使得运动预测器能够快速适应运动变化并更准确地预测未来运动。我们的实验结果表明,与DanceTrack和SportsMOT数据集上的先进跟踪器相比,ETTrack取得了具有竞争力的性能,其HOTA指标分别达到56.4%和74.4%。