State-of-the-art visual localization approaches generally rely on a first image retrieval step whose role is crucial. Yet, retrieval often struggles when facing varying conditions, due to e.g. weather or time of day, with dramatic consequences on the visual localization accuracy. In this paper, we improve this retrieval step and tailor it to the final localization task. Among the several changes we advocate for, we propose to synthesize variants of the training set images, obtained from generative text-to-image models, in order to automatically expand the training set towards a number of nameable variations that particularly hurt visual localization. After expanding the training set, we propose a training approach that leverages the specificities and the underlying geometry of this mix of real and synthetic images. We experimentally show that those changes translate into large improvements for the most challenging visual localization datasets. Project page: https://europe.naverlabs.com/ret4loc
翻译:最先进的视觉定位方法通常依赖于第一步图像检索,这一步骤至关重要。然而,检索在面对天气或时间变化等不同条件时往往表现不佳,这会对视觉定位精度造成严重影响。本文改进了这一检索步骤,并使其适用于最终的定位任务。在提出的多项改进中,我们建议利用生成式文本到图像模型合成训练集图像的变体,从而自动将训练集扩展到若干可命名的、尤其影响视觉定位的变种。在扩展训练集后,我们提出了一种训练方法,利用真实与合成图像混合数据的特性及其底层几何结构。实验表明,这些改进在最具挑战性的视觉定位数据集上带来了显著提升。项目页面:https://europe.naverlabs.com/ret4loc