Cross-view geo-localization (CVGL) is fundamental for precise localization and navigation in GPS-denied environments, aiming to match ground or UAV imagery with satellite views. Existing approaches often rely on global feature alignment, but they suffer from substantial domain shifts induced by varying regional textures and weather conditions. This issue becomes even more pronounced in UAV-based scenarios, where the broader perspective inevitably introduces dense, fine-grained objects, creating significant visual clutter. To address this, we draw inspiration from Object-Centric Learning (OCL) and propose InfoGeo, an information-theoretic framework designed to enhance robustness and generalization. InfoGeo reformulates the optimization as an information bottleneck process with two core objectives: (i) maximizing view-invariant information by aligning the object-centric structural relations across views, and (ii) minimizing view-specific noisy signals through cross-view knowledge constraints. Extensive evaluations across diverse benchmarks and challenging scenarios demonstrate that InfoGeo significantly outperforms state-of-the-art methods.
翻译:交叉视角地理定位(CVGL)是在GPS缺失环境下实现精确定位与导航的基础技术,其目标是将地面图像或无人机图像与卫星视角视图进行匹配。现有方法通常依赖全局特征对齐,但此类方法易受区域纹理差异和天气条件导致的显著域偏移影响。这一问题在无人机应用场景中尤为突出——由于广视角必然引入密集的细粒度物体,从而产生严重的视觉混乱。为解决该问题,我们受物体中心学习(OCL)启发,提出InfoGeo——一个旨在增强鲁棒性与泛化能力的信息论框架。InfoGeo将优化过程重新定义为信息瓶颈过程,包含两个核心目标:(i)通过对齐跨视角的物体中心结构关系最大化视角不变信息,以及(ii)通过跨视角知识约束最小化视角特有噪声信号。在多种基准数据集与挑战性场景中的广泛评估表明,InfoGeo显著优于现有最优方法。