Modern learning-based visual feature extraction networks perform well in intra-domain localization, however, their performance significantly declines when image pairs are captured across long-term visual domain variations, such as different seasonal and daytime variations. In this paper, our first contribution is a benchmark to investigate the performance impact of long-term variations on visual localization. We conduct a thorough analysis of the performance of current state-of-the-art feature extraction networks under various domain changes and find a significant performance gap between intra- and cross-domain localization. We investigate different methods to close this gap by improving the supervision of modern feature extractor networks. We propose a novel data-centric method, Implicit Cross-Domain Correspondences (iCDC). iCDC represents the same environment with multiple Neural Radiance Fields, each fitting the scene under individual visual domains. It utilizes the underlying 3D representations to generate accurate correspondences across different long-term visual conditions. Our proposed method enhances cross-domain localization performance, significantly reducing the performance gap. When evaluated on popular long-term localization benchmarks, our trained networks consistently outperform existing methods. This work serves as a substantial stride toward more robust visual localization pipelines for long-term deployments, and opens up research avenues in the development of long-term invariant descriptors.
翻译:现代基于学习的视觉特征提取网络在域内定位中表现良好,但当图像对在长期视觉域变化(如不同季节和昼夜变化)下拍摄时,其性能显著下降。本文的首要贡献是构建一个基准,以研究长期变化对视觉定位性能的影响。我们深入分析了当前最先进特征提取网络在不同域变化下的性能,发现域内定位与跨域定位之间存在显著性能差距。我们研究了多种通过改进现代特征提取网络监督机制来缩小这一差距的方法,并提出了一种新颖的数据中心方法——隐式跨域对应(iCDC)。iCDC通过多个神经辐射场表示同一场景,每个场在单个视觉域下对场景进行拟合,并利用底层三维表示为不同长期视觉条件生成精确的对应关系。所提方法显著增强了跨域定位性能,大幅缩小了性能差距。在流行的长期定位基准评估中,经我们训练的网络始终优于现有方法。该工作为长期部署中更鲁棒的视觉定位管线迈出了实质性一步,并开辟了长期不变描述符开发的研究新方向。