Multi-Object Tracking (MOT) aims to detect and associate all desired objects across frames. Most methods accomplish the task by explicitly or implicitly leveraging strong cues (i.e., spatial and appearance information), which exhibit powerful instance-level discrimination. However, when object occlusion and clustering occur, both spatial and appearance information will become ambiguous simultaneously due to the high overlap between objects. In this paper, we demonstrate that this long-standing challenge in MOT can be efficiently and effectively resolved by incorporating weak cues to compensate for strong cues. Along with velocity direction, we introduce the confidence state and height state as potential weak cues. With superior performance, our method still maintains Simple, Online and Real-Time (SORT) characteristics. Furthermore, our method shows strong generalization for diverse trackers and scenarios in a plug-and-play and training-free manner. Significant and consistent improvements are observed when applying our method to 5 different representative trackers. Further, by leveraging both strong and weak cues, our method Hybrid-SORT achieves superior performance on diverse benchmarks, including MOT17, MOT20, and especially DanceTrack where interaction and occlusion are frequent and severe. The code and models are available at https://github.com/ymzis69/HybirdSORT.
翻译:多目标跟踪(MOT)旨在跨帧检测并关联所有目标对象。大多数方法通过显式或隐式利用强线索(即空间和外观信息)来完成该任务,这些线索具有强大的实例级判别能力。然而,当目标发生遮挡和聚集时,由于目标间高度重叠,空间和外观信息将同时变得模糊。本文证明,通过引入弱线索来补偿强线索,可以有效且高效地解决MOT中这一长期存在的挑战。除速度方向外,我们将置信度状态和高度状态作为潜在的弱线索引入。在保持优越性能的同时,我们的方法仍具备SORT(简单、在线、实时)特性。此外,该方法以即插即用且无需训练的方式展现出对多种跟踪器及场景的强泛化能力。将该方法应用于5个不同代表性跟踪器时,均观察到显著且一致的性能提升。进一步地,通过同时利用强线索和弱线索,我们的Hybrid-SORT方法在多个基准测试中取得了优越性能,包括MOT17、MOT20,尤其是在交互和遮挡频繁且严重的DanceTrack数据集上。代码与模型已开源在https://github.com/ymzis69/HybirdSORT。