Visual Geo-localization (VG) is a critical research area for identifying geo-locations from visual inputs, particularly in autonomous navigation for robotics and vehicles. Current VG methods often learn feature extractors from geo-labeled images to create dense, geographically relevant representations. Recent advances in Self-Supervised Learning (SSL) have demonstrated its capability to achieve performance on par with supervised techniques with unlabeled images. This study presents a novel VG-SSL framework, designed for versatile integration and benchmarking of diverse SSL methods for representation learning in VG, featuring a unique geo-related pair strategy, GeoPair. Through extensive performance analysis, we adapt SSL techniques to improve VG on datasets from hand-held and car-mounted cameras used in robotics and autonomous vehicles. Our results show that contrastive learning and information maximization methods yield superior geo-specific representation quality, matching or surpassing the performance of state-of-the-art VG techniques. To our knowledge, This is the first benchmarking study of SSL in VG, highlighting its potential in enhancing geo-specific visual representations for robotics and autonomous vehicles. The code is publicly available at https://github.com/arplaboratory/VG-SSL.
翻译:视觉地理定位(VG)是一个关键的研究领域,旨在从视觉输入中识别地理位置,特别是在机器人和车辆的自主导航中。当前的VG方法通常从带有地理标签的图像中学习特征提取器,以创建密集且与地理相关的表示。自监督学习(SSL)的最新进展表明,其能够利用未标记图像达到与监督技术相当的性能。本研究提出了一种新颖的VG-SSL框架,旨在灵活集成和基准测试多种用于VG表示学习的SSL方法,并采用了一种独特的地理相关配对策略——GeoPair。通过广泛的性能分析,我们调整了SSL技术,以改进在机器人和自动驾驶车辆中使用的手持式和车载摄像头数据集上的VG任务。我们的结果表明,对比学习和信息最大化方法能够产生更优的地理特异性表示质量,其性能匹配甚至超越了最先进的VG技术。据我们所知,这是VG领域中首个SSL基准测试研究,凸显了其在增强机器人和自动驾驶车辆地理特异性视觉表示方面的潜力。代码公开于 https://github.com/arplaboratory/VG-SSL。