The goal of multi-object tracking (MOT) is to detect and track all objects in a scene across frames, while maintaining a unique identity for each object. Most existing methods rely on the spatial motion features and appearance embedding features of the detected objects in consecutive frames. Effectively and robustly representing the spatial and appearance features of long trajectories has become a critical factor affecting the performance of MOT. We propose a novel approach for appearance and spatial feature representation, improving upon the clustering association method MOT\_FCG. For spatial motion features, we propose Diagonal Modulated GIoU, which more accurately represents the relationship between the position and shape of the objects. For appearance features, we utilize a dynamic appearance representation that incorporates confidence information, enabling the trajectory appearance features to be more robust and global. Based on the baseline model MOT\_FCG, we achieved 76.1 HOTA, 80.4 MOTA and 81.3 IDF1 on the MOT17 validation set, and also achieved competitive performance on the MOT20 and DanceTrack validation sets.
翻译:多目标跟踪(MOT)的目标是在视频序列中检测并跟踪场景中的所有目标,同时为每个目标维持唯一的身份标识。现有方法大多依赖于检测目标在连续帧中的空间运动特征和外观嵌入特征。如何有效且鲁棒地表示长轨迹的空间与外观特征,已成为影响MOT性能的关键因素。我们提出了一种新颖的外观与空间特征表示方法,改进了基于聚类关联的MOT_FCG方法。对于空间运动特征,我们提出了对角线调制GIoU,它能更精确地表示目标位置与形状之间的关系。对于外观特征,我们采用了一种融合置信度信息的动态外观表示,使得轨迹外观特征更具鲁棒性和全局性。基于基线模型MOT_FCG,我们在MOT17验证集上取得了76.1 HOTA、80.4 MOTA和81.3 IDF1的指标,同时在MOT20和DanceTrack验证集上也取得了具有竞争力的性能。