Multi-object tracking (MOT) at low frame rates can reduce computational, storage and power overhead to better meet the constraints of edge devices. Many existing MOT methods suffer from significant performance degradation in low-frame-rate videos due to significant location and appearance changes between adjacent frames. To this end, we propose to explore collaborative tracking learning (ColTrack) for frame-rate-insensitive MOT in a query-based end-to-end manner. Multiple historical queries of the same target jointly track it with richer temporal descriptions. Meanwhile, we insert an information refinement module between every two temporal blocking decoders to better fuse temporal clues and refine features. Moreover, a tracking object consistency loss is proposed to guide the interaction between historical queries. Extensive experimental results demonstrate that in high-frame-rate videos, ColTrack obtains higher performance than state-of-the-art methods on large-scale datasets Dancetrack and BDD100K, and outperforms the existing end-to-end methods on MOT17. More importantly, ColTrack has a significant advantage over state-of-the-art methods in low-frame-rate videos, which allows it to obtain faster processing speeds by reducing frame-rate requirements while maintaining higher performance. Code will be released at https://github.com/yolomax/ColTrack
翻译:多目标跟踪在低帧率下可以降低计算、存储和功耗开销,从而更好地满足边缘设备的限制。由于相邻帧间显著的位置和外观变化,许多现有的多目标跟踪方法在低帧率视频中会出现性能严重下降的问题。为此,我们提出探索协同跟踪学习(ColTrack),以基于查询的端到端方式实现帧率不敏感的多目标跟踪。同一目标的多个历史查询通过更丰富的时序描述协同跟踪该目标。同时,我们在每两个时序阻塞解码器之间插入信息精炼模块,以更好地融合时序线索并精炼特征。此外,我们还提出了一种跟踪目标一致性损失,以指导历史查询之间的交互。大量实验结果表明,在高帧率视频中,ColTrack在大型数据集Dancetrack和BDD100K上取得了比最先进方法更高的性能,并在MOT17上超越了现有端到端方法。更重要的是,在低帧率视频中,ColTrack相较于最先进方法具有显著优势,这使得它能够在降低帧率要求的同时维持更高性能,从而获得更快的处理速度。代码将发布在https://github.com/yolomax/ColTrack。