Significant progress has been achieved in multi-object tracking (MOT) through the evolution of detection and re-identification (ReID) techniques. Despite these advancements, accurately tracking objects in scenarios with homogeneous appearance and heterogeneous motion remains a challenge. This challenge arises from two main factors: the insufficient discriminability of ReID features and the predominant utilization of linear motion models in MOT. In this context, we introduce a novel motion-based tracker, MotionTrack, centered around a learnable motion predictor that relies solely on object trajectory information. This predictor comprehensively integrates two levels of granularity in motion features to enhance the modeling of temporal dynamics and facilitate precise future motion prediction for individual objects. Specifically, the proposed approach adopts a self-attention mechanism to capture token-level information and a Dynamic MLP layer to model channel-level features. MotionTrack is a simple, online tracking approach. Our experimental results demonstrate that MotionTrack yields state-of-the-art performance on datasets such as Dancetrack and SportsMOT, characterized by highly complex object motion.
翻译:在多目标跟踪(MOT)领域,通过检测与重识别(ReID)技术的演进已取得显著进展。尽管有这些进步,在相似外观与异质运动场景中精确跟踪目标仍是一项挑战。这一挑战主要源于两方面因素:ReID特征判别能力不足,以及MOT中线性运动模型的广泛使用。为此,我们提出一种新颖的基于运动的跟踪器MotionTrack,其核心是仅依赖目标轨迹信息的可学习运动预测器。该预测器通过整合两个运动特征粒度层级,以增强时间动态建模能力,并促进对单个目标未来运动的精确预测。具体而言,该方法采用自注意力机制捕获令牌级信息,并通过动态MLP层建模通道级特征。MotionTrack是一种简洁的在线跟踪方法。实验结果表明,在具有高度复杂物体运动的Dancetrack和SportsMOT等数据集中,MotionTrack达到了最先进的性能水平。