Retrieving relevant multimedia content is one of the main problems in a world that is increasingly data-driven. With the proliferation of drones, high quality aerial footage is now available to a wide audience for the first time. Integrating this footage into applications can enable GPS-less geo-localisation or location correction. In this paper, we present an orientation-guided training framework for UAV-view geo-localisation. Through hierarchical localisation orientations of the UAV images are estimated in relation to the satellite imagery. We propose a lightweight prediction module for these pseudo labels which predicts the orientation between the different views based on the contrastive learned embeddings. We experimentally demonstrate that this prediction supports the training and outperforms previous approaches. The extracted pseudo-labels also enable aligned rotation of the satellite image as augmentation to further strengthen the generalisation. During inference, we no longer need this orientation module, which means that no additional computations are required. We achieve state-of-the-art results on both the University-1652 and University-160k datasets.
翻译:在日益数据驱动的世界中,检索相关多媒体内容已成为主要难题之一。随着无人机普及,高质量航拍影像首次面向广泛受众开放。将这些影像集成到应用中,可实现无GPS地理定位或位置校正。本文提出一种面向引导的无人机视角地理定位训练框架。通过层次化定位,可估计无人机图像相对于卫星影像的方位。我们针对这些伪标签提出轻量级预测模块,该模块基于对比学习嵌入预测不同视角间的方位。实验表明,该预测能有效支撑训练,并超越先前方法。提取的伪标签还能实现卫星图像的对齐旋转增强,进一步提升泛化能力。推理阶段无需该方位模块,因此不增加额外计算量。我们在University-1652和University-160k数据集上均取得了最先进的结果。