Image retrieval-based cross-view localization methods often lead to very coarse camera pose estimation, due to the limited sampling density of the database satellite images. In this paper, we propose a method to increase the accuracy of a ground camera's location and orientation by estimating the relative rotation and translation between the ground-level image and its matched/retrieved satellite image. Our approach designs a geometry-guided cross-view transformer that combines the benefits of conventional geometry and learnable cross-view transformers to map the ground-view observations to an overhead view. Given the synthesized overhead view and observed satellite feature maps, we construct a neural pose optimizer with strong global information embedding ability to estimate the relative rotation between them. After aligning their rotations, we develop an uncertainty-guided spatial correlation to generate a probability map of the vehicle locations, from which the relative translation can be determined. Experimental results demonstrate that our method significantly outperforms the state-of-the-art. Notably, the likelihood of restricting the vehicle lateral pose to be within 1m of its Ground Truth (GT) value on the cross-view KITTI dataset has been improved from $35.54\%$ to $76.44\%$, and the likelihood of restricting the vehicle orientation to be within $1^{\circ}$ of its GT value has been improved from $19.64\%$ to $99.10\%$.
翻译:基于图像检索的跨视角定位方法由于数据库卫星图像采样密度有限,往往只能得到非常粗糙的相机位姿估计。本文提出了一种通过估计地面图像与其匹配/检索到的卫星图像之间的相对旋转和平移,来提高地面相机位置和方向精度的方法。我们设计了一种几何引导的跨视角Transformer,该结构融合了传统几何方法和可学习的跨视角Transformer的优势,将地面视角观测映射到鸟瞰视角。在合成鸟瞰视图与观测卫星特征图的基础上,我们构建了一个具有强大全局信息嵌入能力的神经位姿优化器,用于估计两者之间的相对旋转。在旋转对齐后,我们开发了不确定性引导的空间相关性,生成车辆位置的概率图,由此可确定相对平移。实验结果表明,我们的方法显著优于现有技术。值得注意的是,在跨视角KITTI数据集上,将车辆横向位姿限制在真实值(GT)1米以内的概率从35.54%提升至76.44%,将车辆朝向限制在真实值1度以内的概率从19.64%提升至99.10%。