Precise estimation of global orientation and location is critical to ensure a compelling outdoor Augmented Reality (AR) experience. We address the problem of geo-pose estimation by cross-view matching of query ground images to a geo-referenced aerial satellite image database. Recently, neural network-based methods have shown state-of-the-art performance in cross-view matching. However, most of the prior works focus only on location estimation, ignoring orientation, which cannot meet the requirements in outdoor AR applications. We propose a new transformer neural network-based model and a modified triplet ranking loss for joint location and orientation estimation. Experiments on several benchmark cross-view geo-localization datasets show that our model achieves state-of-the-art performance. Furthermore, we present an approach to extend the single image query-based geo-localization approach by utilizing temporal information from a navigation pipeline for robust continuous geo-localization. Experimentation on several large-scale real-world video sequences demonstrates that our approach enables high-precision and stable AR insertion.
翻译:精准的全局方向和位置估计对于确保令人信服的户外增强现实体验至关重要。我们通过将查询地面图像与地理参考的航空卫星图像数据库进行跨视角匹配,来解决地理姿态估计问题。近年来,基于神经网络的方法在跨视角匹配中展现出最先进的性能。然而,此前的大多数工作仅关注位置估计而忽略方向,这无法满足户外增强现实应用的需求。我们提出了一种基于新型Transformer神经网络模型和改进的三元组排序损失函数的方法,用于联合估计位置和方向。在多个跨视角地理定位基准数据集上的实验表明,我们的模型达到了最先进的性能。此外,我们提出了一种扩展方法,通过利用导航管道中的时序信息,将基于单张图像查询的地理定位方法拓展为鲁棒的连续地理定位。在多个大规模真实世界视频序列上的实验证明,我们的方法能够实现高精度且稳定的增强现实插入。