Deep learning methods for Visual Place Recognition (VPR) have advanced significantly, largely driven by large-scale datasets. However, most existing approaches are trained on a single dataset, which can introduce dataset-specific inductive biases and limit model generalization. While multi-dataset joint training offers a promising solution for developing universal VPR models, divergences among training datasets can saturate the limited information capacity in feature aggregation layers, leading to suboptimal performance. To address these challenges, we propose Query-based Adaptive Aggregation (QAA), a novel feature aggregation technique that leverages learned queries as reference codebooks to effectively enhance information capacity without significant computational or parameter complexity. We show that computing the Cross-query Similarity (CS) between query-level image features and reference codebooks provides a simple yet effective way to generate robust descriptors. Our results demonstrate that QAA outperforms state-of-the-art models, achieving balanced generalization across diverse datasets while maintaining peak performance comparable to dataset-specific models. Ablation studies further explore QAA's mechanisms and scalability. Visualizations reveal that the learned queries exhibit diverse attention patterns across datasets. Project page: \href{http://xjh19971.github.io/QAA} {\color{magenta}\texttt{xjh19971.github.io/QAA}}.
翻译:视觉位置识别(VPR)的深度学习方法已取得显著进展,这主要得益于大规模数据集的驱动。然而,现有方法大多在单一数据集上进行训练,这可能会引入数据集特定的归纳偏差并限制模型的泛化能力。尽管多数据集联合训练为开发通用VPR模型提供了一种有前景的解决方案,但训练数据集之间的差异会使特征聚合层有限的信息容量趋于饱和,导致性能欠佳。为应对这些挑战,我们提出基于查询的自适应聚合(QAA),这是一种新颖的特征聚合技术,它利用学习到的查询作为参考码本,在不显著增加计算或参数复杂度的前提下有效提升信息容量。我们证明,通过计算查询级图像特征与参考码本之间的跨查询相似度(CS),能够以简单而有效的方式生成鲁棒的描述符。实验结果表明,QAA优于现有最先进模型,在保持与数据集专用模型相当的峰值性能的同时,实现了跨多样数据集的均衡泛化能力。消融研究进一步探讨了QAA的机制与可扩展性。可视化结果表明,学习到的查询在不同数据集上展现出多样化的注意力模式。项目页面:\href{http://xjh19971.github.io/QAA} {\color{magenta}\texttt{xjh19971.github.io/QAA}}。