In the realm of video object tracking, auxiliary modalities such as depth, thermal, or event data have emerged as valuable assets to complement the RGB trackers. In practice, most existing RGB trackers learn a single set of parameters to use them across datasets and applications. However, a similar single-model unification for multi-modality tracking presents several challenges. These challenges stem from the inherent heterogeneity of inputs -- each with modality-specific representations, the scarcity of multi-modal datasets, and the absence of all the modalities at all times. In this work, we introduce Un-Track, a Unified Tracker of a single set of parameters for any modality. To handle any modality, our method learns their common latent space through low-rank factorization and reconstruction techniques. More importantly, we use only the RGB-X pairs to learn the common latent space. This unique shared representation seamlessly binds all modalities together, enabling effective unification and accommodating any missing modality, all within a single transformer-based architecture. Our Un-Track achieves +8.1 absolute F-score gain, on the DepthTrack dataset, by introducing only +2.14 (over 21.50) GFLOPs with +6.6M (over 93M) parameters, through a simple yet efficient prompting strategy. Extensive comparisons on five benchmark datasets with different modalities show that Un-Track surpasses both SOTA unified trackers and modality-specific counterparts, validating our effectiveness and practicality. The source code is publicly available at https://github.com/Zongwei97/UnTrack.
翻译:在视频目标跟踪领域,深度、热成像或事件数据等辅助模态已成为RGB跟踪器的重要补充资源。实际应用中,现有RGB跟踪器大多通过单一参数集在不同数据集和应用场景中运行。然而,多模态跟踪的类似单模型统一化面临多重挑战:输入固有的异质性导致各模态具有独特的表征方式、多模态数据集的稀缺性,以及无法保证所有模态始终可用。本文提出Un-Track——一种适用于任意模态的单一参数集统一跟踪器。为处理任意模态,该方法通过低秩因子分解与重构技术学习模态间的公共潜在空间。更重要的是,我们仅利用RGB-X配对数据来学习该公共潜在空间。这种独特的共享表征能够无缝绑定所有模态,在单一Transformer架构中实现有效统一并适应任意缺失模态。通过简洁高效的提示策略,Un-Track在DepthTrack数据集上仅增加+2.14 GFLOPs(相对于21.50 GFLOPs)和+6.6M参数(相对于93M参数),即获得+8.1的绝对F-score提升。在涉及不同模态的五个基准数据集上的广泛对比表明,Un-Track不仅超越现有最先进统一跟踪器,也优于特定模态跟踪器,验证了其有效性和实用性。源代码已公开于https://github.com/Zongwei97/UnTrack。