Worldwide image geolocalization, which aims to predict the GPS coordinates of any image on Earth, remains challenging due to global visual diversity. Recent generative approaches based on Retrieval-Augmented Generation (RAG) and Large Multimodal Models (LMMs) leverage candidates retrieved from fixed databases for reasoning, but often struggle with scenes that are absent from the reference set. In this work, we propose GeoSearch, an open-world geolocation framework that integrates web-scale reverse image search into the RAG pipeline. GeoSearch augments LMM prompts with database-retrieved coordinates and textual evidence extracted from web pages. To mitigate noise from irrelevant content, we introduce a two-layer filtering mechanism consisting of image matching, followed by confidence-based gating. Experiments on standard benchmarks Im2GPS3k and YFCC4k demonstrate the superiority of GeoSearch under leakage-aware evaluation. Our code and data are publicly available to support reproducibility.
翻译:全球图像地理定位旨在预测地球上任一图像的GPS坐标,但由于全球视觉多样性始终面临挑战。基于检索增强生成(RAG)与大语言多模态模型(LMMs)的生成式方法虽可借助固定数据库中检索的候选信息进行推理,但常难以处理参考集缺失的自然场景。本文提出GeoSearch——一种开放世界地理定位框架,通过将网页级逆向图像搜索集成至RAG流程,利用数据库检索坐标及网页文本证据增强LMM提示。为抑制不相关内容引入的噪声,我们设计双层过滤机制:先进行图像匹配,再实施基于置信度的门控控制。在标准基准Im2GPS3k与YFCC4k上的实验表明,GeoSearch在泄漏感知评估方案下具有显著优势。我们公开代码与数据以支持结果可复现性。