Single-object tracking is a well-known and challenging research topic in computer vision. Over the last two decades, numerous researchers have proposed various algorithms to solve this problem and achieved promising results. Recently, Transformer-based tracking approaches have ushered in a new era in single-object tracking by introducing new perspectives and achieving superior tracking robustness. In this paper, we conduct an in-depth literature analysis of Transformer tracking approaches by categorizing them into CNN-Transformer based trackers, Two-stream Two-stage fully-Transformer based trackers, and One-stream One-stage fully-Transformer based trackers. In addition, we conduct experimental evaluations to assess their tracking robustness and computational efficiency using publicly available benchmark datasets. Furthermore, we measure their performances on different tracking scenarios to identify their strengths and weaknesses in particular situations. Our survey provides insights into the underlying principles of Transformer tracking approaches, the challenges they encounter, and the future directions they may take.
翻译:单目标跟踪是计算机视觉中一个广为人知且具有挑战性的研究课题。过去二十年,众多研究者提出了多种算法来解决该问题,并取得了令人瞩目的成果。近年来,基于Transformer的跟踪方法通过引入新视角并实现卓越的跟踪鲁棒性,开启了单目标跟踪的新纪元。本文通过将Transformer跟踪方法划分为基于CNN-Transformer的跟踪器、双流双阶段全Transformer跟踪器以及单流单阶段全Transformer跟踪器,对其进行了深入的文献分析。此外,我们利用公开基准数据集进行了实验评估,以衡量其跟踪鲁棒性和计算效率。进一步,我们测量了它们在不同跟踪场景下的性能,以识别其在特定情境中的优势与不足。本综述深入剖析了Transformer跟踪方法的基本原理、所面临的挑战以及未来可能的发展方向。