Visual Place Recognition (VPR) demands representations robust to drastic environmental and viewpoint shifts. Current aggregation paradigms, however, either rely on data-hungry supervision or simplistic first-order statistics, often neglecting intrinsic structural correlations. In this work, we propose a Second-Order Geometric Statistics framework that inherently captures geometric stability without training. We conceptualize scenes as covariance descriptors on the Symmetric Positive Definite (SPD) manifold, where perturbations manifest as tractable congruence transformations. By leveraging geometry-aware Riemannian mappings, we project these descriptors into a linearized Euclidean embedding, effectively decoupling signal structure from noise. Our approach introduces a training-free framework built upon fixed, pre-trained backbones, achieving strong zero-shot generalization without parameter updates. Extensive experiments confirm that our method achieves highly competitive performance against state-of-the-art baselines, particularly excelling in challenging zero-shot scenarios.
翻译:视觉地点识别要求表示对剧烈的环境和视角变化具有鲁棒性。然而,当前的聚合范式要么依赖于数据密集型的监督,要么采用简单的一阶统计量,往往忽略了内在的结构相关性。在本工作中,我们提出了一种二阶几何统计框架,该框架无需训练即可固有地捕获几何稳定性。我们将场景概念化为对称正定流形上的协方差描述符,其中扰动表现为可处理的合同变换。通过利用几何感知的黎曼映射,我们将这些描述符投影到线性化的欧几里得嵌入空间中,从而有效地将信号结构与噪声解耦。我们的方法引入了一种基于固定预训练主干网络的免训练框架,无需参数更新即可实现强大的零样本泛化能力。大量实验证实,我们的方法相较于最先进的基线模型取得了极具竞争力的性能,尤其在具有挑战性的零样本场景中表现优异。