One of the central challenges in visual place recognition (VPR) is learning a robust global representation that remains discriminative under large viewpoint changes, illumination variations, and severe domain shifts. While visual foundation models (VFMs) provide strong local features, most existing methods rely on a single model, overlooking the complementary cues offered by different VFMs. However, exploiting such complementary information inevitably alters token distributions, which challenges the stability of existing query-based global aggregation schemes. To address these challenges, we propose DC-VLAQ, a representation-centric framework that integrates the fusion of complementary VFMs and robust global aggregation. Specifically, we first introduce a lightweight residual-guided complementary fusion that anchors representations in the DINOv2 feature space while injecting complementary semantics from CLIP through a learned residual correction. In addition, we propose the Vector of Local Aggregated Queries (VLAQ), a query--residual global aggregation scheme that encodes local tokens by their residual responses to learnable queries, resulting in improved stability and the preservation of fine-grained discriminative cues. Extensive experiments on standard VPR benchmarks, including Pitts30k, Tokyo24/7, MSLS, Nordland, SPED, and AmsterTime, demonstrate that DC-VLAQ consistently outperforms strong baselines and achieves state-of-the-art performance, particularly under challenging domain shifts and long-term appearance changes.
翻译:视觉位置识别(VPR)的核心挑战之一在于学习一种鲁棒的全局表示,该表示在大视角变化、光照变化和严重域偏移下仍能保持判别性。虽然视觉基础模型(VFMs)提供了强大的局部特征,但现有方法大多依赖单一模型,忽略了不同VFMs所提供的互补线索。然而,利用这种互补信息不可避免地会改变令牌分布,这对现有基于查询的全局聚合方案的稳定性提出了挑战。为应对这些挑战,我们提出了DC-VLAQ,这是一个以表示为中心的框架,集成了互补VFMs的融合与鲁棒的全局聚合。具体而言,我们首先引入了一种轻量级的残差引导互补融合方法,该方法将表示锚定在DINOv2特征空间中,同时通过学习的残差校正注入来自CLIP的互补语义。此外,我们提出了局部聚合查询向量(VLAQ),这是一种查询-残差全局聚合方案,它通过局部令牌对可学习查询的残差响应进行编码,从而提高了稳定性并保留了细粒度的判别性线索。在标准VPR基准测试(包括Pitts30k、Tokyo24/7、MSLS、Nordland、SPED和AmsterTime)上进行的大量实验表明,DC-VLAQ始终优于强基线方法,并实现了最先进的性能,尤其是在具有挑战性的域偏移和长期外观变化条件下。