We introduce, XoFTR, a cross-modal cross-view method for local feature matching between thermal infrared (TIR) and visible images. Unlike visible images, TIR images are less susceptible to adverse lighting and weather conditions but present difficulties in matching due to significant texture and intensity differences. Current hand-crafted and learning-based methods for visible-TIR matching fall short in handling viewpoint, scale, and texture diversities. To address this, XoFTR incorporates masked image modeling pre-training and fine-tuning with pseudo-thermal image augmentation to handle the modality differences. Additionally, we introduce a refined matching pipeline that adjusts for scale discrepancies and enhances match reliability through sub-pixel level refinement. To validate our approach, we collect a comprehensive visible-thermal dataset, and show that our method outperforms existing methods on many benchmarks.
翻译:我们提出XoFTR,一种用于热红外(TIR)与可见光图像之间局部特征匹配的跨模态、跨视角方法。与可见光图像不同,TIR图像不易受恶劣光照和天气条件影响,但由于显著的纹理和强度差异,其在匹配时面临挑战。现有的基于手工特征和深度学习方法的可见光-TIR匹配在处理视角、尺度和纹理多样性方面存在不足。为解决这一问题,XoFTR引入了掩码图像建模预训练与伪热图像增强微调以处理模态差异。此外,我们提出一种精炼的匹配流水线,可调整尺度差异并通过亚像素级精化增强匹配可靠性。为验证该方法,我们收集了一个全面的可见光-热红外数据集,并表明我们的方法在多个基准测试上优于现有方法。