Current RGBT tracking researches mainly focus on the modality-complete scenarios, overlooking the modality-missing challenge in real-world scenes. In this work, we comprehensively investigate the impact of modality-missing challenge in RGBT tracking and propose a novel invertible prompt learning approach, which integrates the content-preserving prompts into a well-trained tracking model to adapt to various modality-missing scenarios, for modality-missing RGBT tracking. In particular, given one modality-missing scenario, we propose to utilize the available modality to generate the prompt of the missing modality to adapt to RGBT tracking model. However, the cross-modality gap between available and missing modalities usually causes semantic distortion and information loss in prompt generation. To handle this issue, we propose the invertible prompt learning scheme by incorporating the full reconstruction of the input available modality from the prompt in prompt generation model. Considering that there lacks a modality-missing RGBT tracking dataset and many modality-missing scenarios are difficult to capture, we design a high-quality data simulation method based on hierarchical combination schemes to generate real-world modality-missing data. Extensive experiments on three modality-missing datasets show that our method achieves significant performance improvements compared with state-of-the-art methods. We will release the code and simulation dataset.
翻译:当前的RGBT跟踪研究主要关注模态完整场景,忽略了真实世界场景中的模态缺失挑战。本文全面研究了RGBT跟踪中模态缺失挑战的影响,提出了一种新颖的可逆提示学习方法,将内容保持型提示集成到训练完成的跟踪模型中,以适应各种模态缺失场景,用于模态缺失RGBT跟踪。具体而言,针对某一模态缺失场景,我们提出利用可用模态生成缺失模态的提示,以适配RGBT跟踪模型。然而,可用模态与缺失模态之间的跨模态差异通常会导致提示生成过程中的语义失真和信息丢失。为解决这一问题,我们提出可逆提示学习方案,在提示生成模型中纳入对输入可用模态的完整重构。考虑到现有缺乏模态缺失RGBT跟踪数据集,且许多模态缺失场景难以捕获,我们设计了一种基于层次化组合方案的高质量数据模拟方法,用于生成真实世界的模态缺失数据。在三个模态缺失数据集上的大量实验表明,与最先进方法相比,我们的方法取得了显著的性能提升。我们将公开代码和模拟数据集。