Multi-object tracking (MOT) at low frame rates can reduce computational, storage and power overhead to better meet the constraints of edge devices. Many existing MOT methods suffer from significant performance degradation in low-frame-rate videos due to significant location and appearance changes between adjacent frames. To this end, we propose to explore collaborative tracking learning (ColTrack) for frame-rate-insensitive MOT in a query-based end-to-end manner. Multiple historical queries of the same target jointly track it with richer temporal descriptions. Meanwhile, we insert an information refinement module between every two temporal blocking decoders to better fuse temporal clues and refine features. Moreover, a tracking object consistency loss is proposed to guide the interaction between historical queries. Extensive experimental results demonstrate that in high-frame-rate videos, ColTrack obtains higher performance than state-of-the-art methods on large-scale datasets Dancetrack and BDD100K, and outperforms the existing end-to-end methods on MOT17. More importantly, ColTrack has a significant advantage over state-of-the-art methods in low-frame-rate videos, which allows it to obtain faster processing speeds by reducing frame-rate requirements while maintaining higher performance. Code will be released at https://github.com/yolomax/ColTrack
翻译:多目标跟踪在低帧率条件下可降低计算、存储和功耗开销,以更好地满足边缘设备的限制。许多现有方法在低帧率视频中因相邻帧间显著的位置和外观变化而出现性能大幅下降。为此,我们提出探索基于查询的端到端协同跟踪学习以实现帧率不敏感的多目标跟踪。同一目标的多个历史查询通过更丰富的时序描述共同跟踪它。同时,我们在每两个时序阻塞解码器之间插入信息细化模块,以更好地融合时序线索并细化特征。此外,提出一种跟踪对象一致性损失来引导历史查询间的交互。大量实验结果表明,在高帧率视频中,ColTrack在大规模数据集Dancetrack和BDD100K上获得了优于现有最优方法的性能,并在MOT17上超越了现有端到端方法。更重要的是,ColTrack在低帧率视频中相比现有最优方法具有显著优势,这使其能够通过降低帧率要求获得更快的处理速度,同时保持更高性能。代码将发布于https://github.com/yolomax/ColTrack。