RGBT tracking has been widely used in various fields such as robotics, surveillance processing, and autonomous driving. Existing RGBT trackers fully explore the spatial information between the template and the search region and locate the target based on the appearance matching results. However, these RGBT trackers have very limited exploitation of temporal information, either ignoring temporal information or exploiting it through online sampling and training. The former struggles to cope with the object state changes, while the latter neglects the correlation between spatial and temporal information. To alleviate these limitations, we propose a novel Temporal Adaptive RGBT Tracking framework, named as TATrack. TATrack has a spatio-temporal two-stream structure and captures temporal information by an online updated template, where the two-stream structure refers to the multi-modal feature extraction and cross-modal interaction for the initial template and the online update template respectively. TATrack contributes to comprehensively exploit spatio-temporal information and multi-modal information for target localization. In addition, we design a spatio-temporal interaction (STI) mechanism that bridges two branches and enables cross-modal interaction to span longer time scales. Extensive experiments on three popular RGBT tracking benchmarks show that our method achieves state-of-the-art performance, while running at real-time speed.
翻译:RGBT跟踪已广泛应用于机器人、监控处理和自动驾驶等多个领域。现有RGBT跟踪器充分探索模板与搜索区域之间的空间信息,并基于外观匹配结果定位目标。然而,这些RGBT跟踪器对时间信息的利用极为有限,要么完全忽略时间信息,要么通过在线采样与训练进行利用。前者难以应对目标状态变化,后者则忽视了空间信息与时间信息之间的关联。为缓解这些局限,我们提出一种新颖的时序自适应RGBT跟踪框架,命名为TATrack。TATrack采用时空双流结构,并通过在线更新模板捕获时间信息,其中双流结构分别针对初始模板和在线更新模板进行多模态特征提取与跨模态交互。TATrack有助于综合利用时空信息和多模态信息进行目标定位。此外,我们设计了一种时空交互机制,该机制桥接两个分支,使跨模态交互能够跨越更长的时间尺度。在三个主流RGBT跟踪基准上的大量实验表明,我们的方法在实现实时运行速度的同时,达到了最先进的性能水平。