Most deep trackers still follow the guidance of the siamese paradigms and use a template that contains only the target without any contextual information, which makes it difficult for the tracker to cope with large appearance changes, rapid target movement, and attraction from similar objects. To alleviate the above problem, we propose a long-term context attention (LCA) module that can perform extensive information fusion on the target and its context from long-term frames, and calculate the target correlation while enhancing target features. The complete contextual information contains the location of the target as well as the state around the target. LCA uses the target state from the previous frame to exclude the interference of similar objects and complex backgrounds, thus accurately locating the target and enabling the tracker to obtain higher robustness and regression accuracy. By embedding the LCA module in Transformer, we build a powerful online tracker with a target-aware backbone, termed as TATrack. In addition, we propose a dynamic online update algorithm based on the classification confidence of historical information without additional calculation burden. Our tracker achieves state-of-the-art performance on multiple benchmarks, with 71.1\% AUC, 89.3\% NP, and 73.0\% AO on LaSOT, TrackingNet, and GOT-10k. The code and trained models are available on https://github.com/hekaijie123/TATrack.
翻译:[translated abstract in Chinese]
大多数深度跟踪器仍遵循孪生网络范式的指导,仅使用包含目标本身而缺乏上下文信息的模板,这使得跟踪器难以应对大幅外观变化、目标快速移动以及相似物体的干扰。为缓解上述问题,我们提出长时上下文注意力(LCA)模块,该模块能够对长时帧序列中目标及其上下文进行广泛的信息融合,在增强目标特征的同时计算目标相关性。完整的上下文信息包含目标位置及目标周围状态。LCA利用前一帧的目标状态排除相似物体与复杂背景的干扰,从而精确定位目标,使跟踪器获得更高的鲁棒性与回归精度。通过将LCA模块嵌入Transformer,我们构建了一个具有目标感知骨干网络的强大在线跟踪器,称为TATrack。此外,我们提出一种基于历史信息分类置信度的动态在线更新算法,且无额外计算负担。我们的跟踪器在多个基准上取得了最先进的性能:在LaSOT、TrackingNet和GOT-10k上分别达到71.1% AUC、89.3% NP和73.0% AO。代码与训练模型已开源至https://github.com/hekaijie123/TATrack。