Significant advancements have been made in multi-object tracking (MOT) with the development of detection and re-identification (ReID) techniques. Despite these developments, the task of accurately tracking objects in scenarios with homogeneous appearance and heterogeneous motion remains challenging due to the insufficient discriminability of ReID features and the predominant use of linear motion models in MOT. In this context, we present a novel learnable motion predictor, named MotionTrack, which comprehensively incorporates two levels of granularity of motion features to enhance the modeling of temporal dynamics and facilitate accurate future motion prediction of individual objects. Specifically, the proposed approach adopts a self-attention mechanism to capture token-level information and a Dynamic MLP layer to model channel-level features. MotionTrack is a simple, online tracking approach. Our experimental results demonstrate that MotionTrack yields state-of-the-art performance on demanding datasets such as SportsMOT and Dancetrack, which feature highly nonlinear object motion. Notably, without fine-tuning on target datasets, MotionTrack also exhibits competitive performance on conventional benchmarks including MOT17 and MOT20.
翻译:随着检测与重识别(ReID)技术的发展,多目标跟踪(MOT)领域取得了显著进展。然而,在目标外观同质化、运动异质化的场景中,由于ReID特征区分性不足以及MOT中线性运动模型的广泛采用,精确跟踪目标仍面临挑战。为此,我们提出了一种名为MotionTrack的新型可学习运动预测器,该预测器全面融合了两个运动特征粒度层级,以增强时序动态建模能力,并促进对个体目标未来运动的精确预测。具体而言,所提方法采用自注意力机制捕获令牌级信息,并利用动态MLP层建模通道级特征。MotionTrack是一种简单、在线式的跟踪方法。实验结果表明,MotionTrack在具备高度非线性目标运动特性的SportsMOT和Dancetrack等挑战性数据集上取得了最先进的性能。值得注意的是,无需在目标数据集上进行微调,MotionTrack在MOT17和MOT20等传统基准测试中同样展现出具有竞争力的表现。