Generalizable cross-view geo-localization aims to match the same location across views in unseen regions and conditions without GPS supervision. Its core difficulty lies in severe semantic inconsistency caused by viewpoint variation and poor generalization under domain shift. Existing methods mainly rely on 2D correspondence, but they are easily distracted by redundant shared information across views, leading to less transferable representations. To address this, we propose GeoLink, a 3D-aware semantic-consistent framework for Generalizable cross-view geo-localization. Specifically, we offline reconstruct scene point clouds from multi-view drone images using VGGT, providing stable structural priors. Based on these 3D anchors, we improve 2D representation learning in two complementary ways. A Geometric-aware Semantic Refinement module mitigates potentially redundant and view-biased dependencies in 2D features under 3D guidance. In addition, a Unified View Relation Distillation module transfers 3D structural relations to 2D features, improving cross-view alignment while preserving a 2D-only inference pipeline. Extensive experiments on multiple benchmarks show that GeoLink consistently outperforms state-of-the-art methods and achieves superior generalization across unseen domains and diverse weather environments.
翻译:可泛化跨视角地理定位旨在无GPS监督条件下,在未见区域和不同场景中匹配同一位置的不同视角图像。其核心难点在于视角变化导致的严重语义不一致性,以及域偏移下的泛化能力不足。现有方法主要依赖二维特征对应关系,但容易受跨视角冗余共享信息干扰,导致表征迁移性较弱。为此,我们提出GeoLink——一个面向可泛化跨视角地理定位的三维感知语义一致性框架。具体而言,我们利用VGGT离线从多视角无人机影像中重建场景点云,提供稳定的结构先验。基于这些三维锚点,我们从两个互补方向改进二维表征学习:在三维引导下,几何感知语义精炼模块可缓解二维特征中潜在的冗余和视角偏差依赖;此外,统一视角关系蒸馏模块将三维结构关系迁移至二维特征,在保持仅需二维推理的流水线前提下提升跨视角对齐效果。在多个基准数据集上的大量实验表明,GeoLink持续超越现有最优方法,在未见域和多样化天气环境下均取得优异的泛化性能。