Many multi-object tracking (MOT) methods follow the framework of "tracking by detection", which associates the target objects-of-interest based on the detection results. However, due to the separate models for detection and association, the tracking results are not optimal.Moreover, the speed is limited by some cumbersome association methods to achieve high tracking performance. In this work, we propose an end-to-end MOT method, with a Gaussian filter-inspired dynamic search region refinement module to dynamically filter and refine the search region by considering both the template information from the past frames and the detection results from the current frame with little computational burden, and a lightweight attention-based tracking head to achieve the effective fine-grained instance association. Extensive experiments and ablation study on MOT17 and MOT20 datasets demonstrate that our method can achieve the state-of-the-art performance with reasonable speed.
翻译:许多多目标跟踪方法遵循“检测后跟踪”框架,即基于检测结果关联感兴趣的目标对象。然而,由于检测与关联采用独立模型,跟踪结果并非最优。此外,为实现高跟踪性能,某些繁琐的关联方法限制了处理速度。本文提出了一种端到端的多目标跟踪方法,其中包含:一个受高斯滤波器启发的动态搜索区域细化模块,通过结合过去帧的模板信息与当前帧的检测结果,以极小的计算开销动态过滤并细化搜索区域;以及一个轻量级基于注意力的跟踪头,以实现高效细粒度实例关联。在MOT17和MOT20数据集上的大量实验与消融研究表明,该方法在保持合理速度的同时达到了最先进的性能。