Determining the precise geographic location of an image at a global scale remains an unsolved challenge. Standard image retrieval techniques are inefficient due to the sheer volume of images (>100M) and fail when coverage is insufficient. Scalable solutions, however, involve a trade-off: global classification typically yields coarse results (10+ kilometers), while cross-view retrieval between ground and aerial imagery suffers from a domain gap and has been primarily studied on smaller regions. This paper introduces a hybrid approach that achieves fine-grained geo-localization across a large geographic expanse the size of a continent. We leverage a proxy classification task during training to learn rich feature representations that implicitly encode precise location information. We combine these learned prototypes with embeddings of aerial imagery to increase robustness to the sparsity of ground-level data. This enables direct, fine-grained retrieval over areas spanning multiple countries. Our extensive evaluation demonstrates that our approach can localize within 200m more than 68\% of queries of a dataset covering a large part of Europe. The code is publicly available at https://scaling-geoloc.github.io.
翻译:在全球尺度上精确确定图像的地理位置仍是一个未解决的挑战。由于图像数量庞大(>1亿张),标准图像检索技术效率低下,且在覆盖不足时失效。然而,可扩展的解决方案涉及权衡:全局分类通常产生粗略结果(误差超过10公里),而地面与航拍图像之间的跨视角检索则受领域差异影响,且主要在较小区域中得到研究。本文提出一种混合方法,实现了在大陆尺度的大范围地理区域内进行细粒度地理定位。我们通过在训练中利用代理分类任务来学习丰富的特征表示,这些表示隐式编码了精确的位置信息。我们将这些学习到的原型与航拍图像的嵌入相结合,以增强对地面数据稀疏性的鲁棒性。这使得能够在跨越多个国家的区域内进行直接、细粒度的检索。我们的大量评估表明,我们的方法能够在覆盖欧洲大部分区域的数据集中,对超过68%的查询实现200米内的定位精度。代码已在 https://scaling-geoloc.github.io 公开提供。