Due to long-distance correlation and powerful pretrained models, transformer-based methods have initiated a breakthrough in visual object tracking performance. Previous works focus on designing effective architectures suited for tracking, but ignore that data augmentation is equally crucial for training a well-performing model. In this paper, we first explore the impact of general data augmentations on transformer-based trackers via systematic experiments, and reveal the limited effectiveness of these common strategies. Motivated by experimental observations, we then propose two data augmentation methods customized for tracking. First, we optimize existing random cropping via a dynamic search radius mechanism and simulation for boundary samples. Second, we propose a token-level feature mixing augmentation strategy, which enables the model against challenges like background interference. Extensive experiments on two transformer-based trackers and six benchmarks demonstrate the effectiveness and data efficiency of our methods, especially under challenging settings, like one-shot tracking and small image resolutions.
翻译:由于具备长距离关联能力和强大的预训练模型,基于Transformer的方法在视觉目标跟踪性能上取得了突破性进展。以往研究聚焦于设计适合跟踪任务的高效架构,却忽视了数据增强对训练高性能模型同样至关重要。本文首先通过系统性实验探究通用数据增强对Transformer跟踪器的影响,揭示这些常规策略的有限有效性。基于实验观察,我们提出两种面向跟踪任务定制的数据增强方法:其一,通过动态搜索半径机制和边界样本模拟对现有随机裁剪方法进行优化;其二,提出一种令牌级特征混合增强策略,使模型能够应对背景干扰等挑战。在两个Transformer跟踪器和六个基准数据集上的大量实验表明,我们的方法在诸多挑战性场景下(如单样本跟踪和小分辨率图像)具有显著的有效性和数据效率优势。