The goal of multi-object tracking (MOT) is to detect and track all objects in a scene across frames, while maintaining a unique identity for each object. Most existing methods rely on the spatial-temporal motion features and appearance embedding features of the detected objects in consecutive frames. Effectively and robustly representing the spatial and appearance features of long trajectories has become a critical factor affecting the performance of MOT. We propose a novel approach for appearance and spatial-temporal motion feature representation, improving upon the hierarchical clustering association method MOT FCG. For spatialtemporal motion features, we first propose Diagonal Modulated GIoU, which more accurately represents the relationship between the position and shape of the objects. Second, Mean Constant Velocity Modeling is proposed to reduce the effect of observation noise on target motion state estimation. For appearance features, we utilize a dynamic appearance representation that incorporates confidence information, enabling the trajectory appearance features to be more robust and global. Based on the baseline model MOT FCG, we have realized further improvements in the performance of all. we achieved 63.1 HOTA, 76.9 MOTA and 78.2 IDF1 on the MOT17 test set, and also achieved competitive performance on the MOT20 and DanceTrack sets.
翻译:多目标跟踪(MOT)的目标是在视频帧序列中检测并跟踪场景中的所有目标,同时为每个目标维持唯一的身份标识。现有方法大多依赖于检测目标在连续帧中的时空运动特征和外观嵌入特征。如何有效且鲁棒地表示长轨迹的空间与外观特征,已成为影响MOT性能的关键因素。我们提出了一种新颖的外观及时空运动特征表示方法,该方法改进了基于层次聚类关联的MOT FCG模型。对于时空运动特征,我们首先提出了对角线调制广义交并比(Diagonal Modulated GIoU),以更精确地表示目标位置与形状之间的关系。其次,我们提出了平均恒定速度建模,以减小观测噪声对目标运动状态估计的影响。对于外观特征,我们采用了一种融合置信度信息的动态外观表示,使轨迹外观特征更具鲁棒性和全局性。在基线模型MOT FCG的基础上,我们实现了整体性能的进一步提升。在MOT17测试集上,我们取得了63.1 HOTA、76.9 MOTA和78.2 IDF1的指标,同时在MOT20和DanceTrack数据集上也取得了具有竞争力的性能。