In this paper, we introduce a novel approach to fine-grained cross-view geo-localization. Our method aligns a warped ground image with a corresponding GPS-tagged satellite image covering the same area using homography estimation. We first employ a differentiable spherical transform, adhering to geometric principles, to accurately align the perspective of the ground image with the satellite map. This transformation effectively places ground and aerial images in the same view and on the same plane, reducing the task to an image alignment problem. To address challenges such as occlusion, small overlapping range, and seasonal variations, we propose a robust correlation-aware homography estimator to align similar parts of the transformed ground image with the satellite image. Our method achieves sub-pixel resolution and meter-level GPS accuracy by mapping the center point of the transformed ground image to the satellite image using a homography matrix and determining the orientation of the ground camera using a point above the central axis. Operating at a speed of 30 FPS, our method outperforms state-of-the-art techniques, reducing the mean metric localization error by 21.3% and 32.4% in same-area and cross-area generalization tasks on the VIGOR benchmark, respectively, and by 34.4% on the KITTI benchmark in same-area evaluation.
翻译:本文提出了一种新颖的细粒度跨视角地理定位方法。该方法通过单应性估计,将经过校正的地面图像与覆盖同一区域的GPS标记卫星图像进行对齐。我们首先采用可微分球面变换,严格遵循几何原理,将地面图像视角与卫星地图精确对齐。这一变换将地面与航拍图像置于同一视角与平面,将任务简化为图像对齐问题。针对遮挡、重叠区域小、季节变化等挑战,我们提出了鲁棒的关联感知单应性估计器,用于对齐变换后地面图像与卫星图像中的相似区域。通过利用单应性矩阵将变换后地面图像的中心点映射至卫星图像,并利用中心轴上方的点确定地面相机朝向,本方法实现了亚像素分辨率与米级GPS精度。在30 FPS运行速度下,本方法在VIGOR基准测试的同区域与跨区域泛化任务中,分别将平均度量定位误差降低21.3%和32.4%;在KITTI基准测试的同区域评估中,误差降低34.4%,全面超越现有最优技术。