Cross-modal Thermal Geo-localization (TG) provides a robust, all-weather solution for Unmanned Aerial Vehicles (UAVs) in Global Navigation Satellite System (GNSS)-denied environments. However, profound thermal-visible modality gaps introduce severe feature ambiguity, systematically corrupting conventional coarse-to-fine registration. To dismantle this bottleneck, we propose SCC-Loc, a unified Semantic-Cascade-Consensus localization framework. By sharing a single DINOv2 backbone across global retrieval and MINIMA$_{\text{RoMa}}$ matching, it minimizes memory footprint and achieves zero-shot, highly accurate absolute position estimation. Specifically, we tackle modality ambiguity by introducing three cohesive components. First, we design the Semantic-Guided Viewport Alignment (SGVA) module to adaptively optimize satellite crop regions, effectively correcting initial spatial deviations. Second, we develop the Cascaded Spatial-Adaptive Texture-Structure Filtering (C-SATSF) mechanism to explicitly enforce geometric consistency, thereby eradicating dense cross-modal outliers. Finally, we propose the Consensus-Driven Reliability-Aware Position Selection (CD-RAPS) strategy to derive the optimal solution through a synergy of physically constrained pose optimization. To address data scarcity, we construct Thermal-UAV, a comprehensive dataset providing 11,890 diverse thermal queries referenced against a large-scale satellite ortho-photo and corresponding spatially aligned Digital Surface Model (DSM). Extensive experiments demonstrate that SCC-Loc establishes a new state-of-the-art, suppressing the mean localization error to 9.37 m and providing a 7.6-fold accuracy improvement within a strict 5-m threshold over the strongest baseline. Code and dataset are available at https://github.com/FloralHercules/SCC-Loc.
翻译:跨模态热红外地理定位(TG)为全球导航卫星系统(GNSS)拒止环境中的无人飞行器(UAV)提供了鲁棒的全天候解决方案。然而,深刻的热红外-可见光模态差异引发了严重的特征模糊性,系统性地破坏了传统的由粗到精配准机制。为突破这一瓶颈,我们提出了SCC-Loc——一个统一的语义-级联-共识定位框架。该框架在全局检索与MINIMA$_{\text{RoMa}}$匹配中共享单一DINOv2骨干网络,最小化了内存占用并实现了零样本、高精度的绝对位置估计。具体而言,我们通过引入三个协同组件来解决模态歧义问题。首先,设计了语义引导视口对齐(SGVA)模块以自适应优化卫星裁剪区域,有效校正初始空间偏差。其次,开发了级联空间自适应纹理结构滤波(C-SATSF)机制以显式强化几何一致性,从而消除密集的跨模态离群点。最后,提出了共识驱动可靠性感知位置选择(CD-RAPS)策略,通过物理约束位姿优化的协同作用导出最优解。针对数据稀缺问题,我们构建了Thermal-UAV综合数据集,提供11,890个多样化热红外查询参考,并结合大规模卫星正射影像及对应空间对齐的数字地表模型(DSM)。大量实验表明,SCC-Loc确立了新的最优性能,将平均定位误差抑制至9.37米,并在严格的5米阈值下比最强基线实现了7.6倍的精度提升。代码与数据集已开源至https://github.com/FloralHercules/SCC-Loc。