Unmanned Aerial Vehicle (UAV) visual geo-localization aims to match images of the same geographic target captured from different views, i.e., the UAV view and the satellite view. It is very challenging due to the large appearance differences in UAV-satellite image pairs. Previous works map images captured by UAVs and satellites to a shared feature space and employ a classification framework to learn location-dependent features while neglecting the overall distribution shift between the UAV view and the satellite view. In this paper, we address these limitations by introducing distribution alignment of the two views to shorten their distance in a common space. Specifically, we propose an end-to-end network, called PVDA (Progressive View Distribution Alignment). During training, feature encoder, location classifier, and view discriminator are jointly optimized by a novel progressive adversarial learning strategy. Competition between feature encoder and view discriminator prompts both of them to be stronger. It turns out that the adversarial learning is progressively emphasized until UAV-view images are indistinguishable from satellite-view images. As a result, the proposed PVDA becomes powerful in learning location-dependent yet view-invariant features with good scalability towards unseen images of new locations. Compared to the state-of-the-art methods, the proposed PVDA requires less inference time but has achieved superior performance on the University-1652 dataset.
翻译:无人机视觉地理定位旨在匹配从不同视角(即无人机视角与卫星视角)捕获的同一地理目标图像。由于无人机-卫星图像对存在巨大的外观差异,该任务极具挑战性。以往研究将无人机与卫星拍摄的图像映射至共享特征空间,并采用分类框架学习位置相关特征,却忽视了无人机视角与卫星视角之间的整体分布偏移。本文通过引入两视角分布对齐以缩短其在公共空间中的距离来克服上述局限。具体而言,我们提出一种端到端网络——PVDA(渐进视角分布对齐)。在训练过程中,特征编码器、位置分类器与视角判别器通过一种新颖的渐进对抗学习策略联合优化。特征编码器与视角判别器之间的竞争促使双方能力持续提升,最终使得对抗学习逐步强化,直至无人机视角图像与卫星视角图像难以区分。实验表明,所提PVDA能够有效学习位置相关且视角不变的特征,并对新位置未见图像具有良好的可扩展性。与现有最优方法相比,PVDA在University-1652数据集上以更少的推理时间实现了更优越的性能。