Cross-view geolocalization, a supplement or replacement for GPS, localizes an agent within a search area by matching ground-view images to overhead images. Significant progress has been made assuming a panoramic ground camera. Panoramic cameras' high complexity and cost make non-panoramic cameras more widely applicable, but also more challenging since they yield less scene overlap between ground and overhead images. This paper presents Restricted FOV Wide-Area Geolocalization (ReWAG), a cross-view geolocalization approach that combines a neural network and particle filter to globally localize a mobile agent with only odometry and a non-panoramic camera. ReWAG creates pose-aware embeddings and provides a strategy to incorporate particle pose into the Siamese network, improving localization accuracy by a factor of 100 compared to a vision transformer baseline. This extended work also presents ReWAG*, which improves upon ReWAG's generalization ability in previously unseen environments. ReWAG* repeatedly converges accurately on a dataset of images we have collected in Boston with a 72 degree field of view (FOV) camera, a location and FOV that ReWAG* was not trained on.
翻译:跨视角地理定位作为全球定位系统(GPS)的补充或替代方案,通过匹配地面视角图像与俯视图像实现代理在搜索区域内的定位。现有研究在假设使用全景地面相机的前提下取得了显著进展。全景相机的高复杂度与成本使得非全景相机更具普适性,但因其导致地面与俯视图像间场景重叠度降低,也带来了更大挑战。本文提出受限视场广域地理定位(ReWAG)方法——一种结合神经网络与粒子滤波的跨视角定位方案,仅依赖里程计与非全景相机即可实现移动代理的全局定位。ReWAG通过构建姿态感知嵌入,并引入将粒子姿态融入孪生网络的策略,相较于视觉Transformer基线模型将定位精度提升两个数量级。本扩展工作同时提出增强版ReWAG*,该模型在未见环境中的泛化能力得到显著提升。在波士顿采集的72度视场图像数据集上,ReWAG*在未经训练的位置与视场条件下仍能持续实现高精度收敛。