We present LoD-Loc v3, a novel method for generalized aerial visual localization in dense urban environments. While prior work LoD-Loc v2 achieves localization through semantic building silhouette alignment with low-detail city models, it suffers from two key limitations: poor cross-scene generalization and frequent failure in dense building scenes. Our method addresses these challenges through two key innovations. First, we develop a new synthetic data generation pipeline that produces InsLoD-Loc - the largest instance segmentation dataset for aerial imagery to date, comprising 100k images with precise instance building annotations. This enables trained models to exhibit remarkable zero-shot generalization capability. Second, we reformulate the localization paradigm by shifting from semantic to instance silhouette alignment, which significantly reduces pose estimation ambiguity in dense scenes. Extensive experiments demonstrate that LoD-Loc v3 outperforms existing state-of-the-art (SOTA) baselines, achieving superior performance in both cross-scene and dense urban scenarios with a large margin. The project is available at https://nudt-sawlab.github.io/LoD-Locv3/.
翻译:本文提出LoD-Loc v3,一种在密集城市环境中实现通用空中视觉定位的新方法。先前工作LoD-Loc v2通过语义建筑轮廓与低细节城市模型的对齐实现定位,但存在两个关键局限:跨场景泛化能力差且在高密度建筑场景中频繁失效。本方法通过两项核心创新解决上述挑战。首先,我们开发了新的合成数据生成流程,构建了目前规模最大的航拍图像实例分割数据集InsLoD-Loc,包含10万张带有精确建筑实例标注的图像。这使得训练模型展现出卓越的零样本泛化能力。其次,我们将定位范式从语义轮廓对齐重构为实例级轮廓对齐,显著降低了密集场景中的位姿估计歧义。大量实验表明,LoD-Loc v3显著超越现有最先进基线方法,在跨场景及密集城市场景中均以较大优势取得卓越性能。项目地址:https://nudt-sawlab.github.io/LoD-Locv3/。