Visual tracking often faces challenges such as invalid targets and decreased performance in low-light conditions when relying solely on RGB image sequences. While incorporating additional modalities like depth and infrared data has proven effective, existing multi-modal imaging platforms are complex and lack real-world applicability. In contrast, near-infrared (NIR) imaging, commonly used in surveillance cameras, can switch between RGB and NIR based on light intensity. However, tracking objects across these heterogeneous modalities poses significant challenges, particularly due to the absence of modality switch signals during tracking. To address these challenges, we propose an adaptive cross-modal object tracking algorithm called Modality-Aware Fusion Network (MAFNet). MAFNet efficiently integrates information from both RGB and NIR modalities using an adaptive weighting mechanism, effectively bridging the appearance gap and enabling a modality-aware target representation. It consists of two key components: an adaptive weighting module and a modality-specific representation module......
翻译:视觉跟踪在仅依赖RGB图像序列时,常面临低光照条件下目标失效与性能下降等问题。尽管融合深度与红外数据等额外模态已被证实有效,但现有多模态成像平台结构复杂且缺乏实际应用性。相比之下,安防监控中广泛使用的近红外(NIR)成像能根据光照强度自动切换RGB与NIR模式。然而,在跟踪过程中由于缺乏模态切换信号,跨异质模态的目标跟踪面临重大挑战。针对上述问题,我们提出一种名为模态感知融合网络(MAFNet)的自适应跨模态目标跟踪算法。该网络通过自适应权重机制高效整合RGB与NIR双模态信息,有效弥合表观差异并实现模态感知的目标表征。MAFNet包含两个核心模块:自适应权重模块与模态特定表征模块......