Inference speed and tracking performance are two critical evaluation metrics in the field of visual tracking. However, high-performance trackers often suffer from slow processing speeds, making them impractical for deployment on resource-constrained devices. To alleviate this issue, we propose FARTrack, a Fast Auto-Regressive Tracking framework. Since autoregression emphasizes the temporal nature of the trajectory sequence, it can maintain high performance while achieving efficient execution across various devices. FARTrack introduces Task-Specific Self-Distillation and Inter-frame Autoregressive Sparsification, designed from the perspectives of shallow-yet-accurate distillation and redundant-to-essential token optimization, respectively. Task-Specific Self-Distillation achieves model compression by distilling task-specific tokens layer by layer, enhancing the model's inference speed while avoiding suboptimal manual teacher-student layer pairs assignments. Meanwhile, Inter-frame Autoregressive Sparsification sequentially condenses multiple templates, avoiding additional runtime overhead while learning a temporally-global optimal sparsification strategy. FARTrack demonstrates outstanding speed and competitive performance. It delivers an AO of 70.6% on GOT-10k in real-time. Beyond, our fastest model achieves a speed of 343 FPS on the GPU and 121 FPS on the CPU.
翻译:推理速度与跟踪性能是视觉跟踪领域两个关键的评估指标。然而,高性能跟踪器通常处理速度较慢,难以部署在资源受限的设备上。为缓解此问题,我们提出了FARTrack,一种快速自回归跟踪框架。由于自回归强调轨迹序列的时间特性,它能够在保持高性能的同时,在各种设备上实现高效执行。FARTrack引入了任务特定自蒸馏与帧间自回归稀疏化,分别从浅层而精确的蒸馏视角和冗余到本质的令牌优化视角进行设计。任务特定自蒸馏通过逐层蒸馏任务特定令牌实现模型压缩,在提升模型推理速度的同时避免了次优的人工师生层配对分配。同时,帧间自回归稀疏化顺序压缩多个模板,在学习时序全局最优稀疏化策略的同时避免了额外的运行时开销。FARTrack展现出卓越的速度与有竞争力的性能。它在GOT-10k数据集上实现了70.6%的AO,并达到实时处理。此外,我们最快的模型在GPU上达到了343 FPS,在CPU上达到了121 FPS的速度。