Cross-modal place recognition (CMPR) enables camera-only robots to localize against pre-built LiDAR maps in autonomous navigation scenarios. This image-to-point-cloud setting is challenged by two coupled ambiguities: the modality gap between perspective RGB appearance and sparse metric geometry, and perceptual aliasing among urban places with similar roads, facades, intersections, and object arrangements. Instead of treating CMPR as a single global descriptor matching problem, we argue that reliable retrieval requires both geometry-aware representation alignment and fine-grained candidate verification. In this paper, we propose G2IA, a geometry-guided instance-aware framework for image-to-point-cloud place recognition. In the retrieval stage, visual geometry priors from VGGT and instance features are integrated to construct place descriptors that are more compatible with LiDAR-derived map representations. In the refinement stage, the retrieved candidates are re-ranked by explicitly verifying whether local instance shapes and their relative spatial layouts are consistent across modalities. Experiments on public benchmarks demonstrate that G2IA consistently improves image-to-point-cloud place recognition under different localization thresholds, and exhibits strong cross-dataset generalization.
翻译:摘要:跨模态地点识别(CMPR)使仅装备摄像头的机器人在自主导航场景中能够基于预建激光雷达地图进行定位。这种图像到点云的设定面临两种耦合的模糊性:透视RGB外观与稀疏度量几何之间的模态差距,以及具有相似道路、立面、交叉口和目标排列的城市地点中的感知混淆。本文并未将CMPR视为单一全局描述符匹配问题,而是认为可靠检索需要几何感知的表征对齐与细粒度候选验证。为此,我们提出G2IA——一种面向图像到点云地点识别的几何引导实例感知框架。在检索阶段,来自VGGT的视觉几何先验与实例特征被整合,以构建与激光雷达衍生地图表征更兼容的地点描述符。在精化阶段,通过显式验证跨模态下局部实例形状及其相对空间布局是否一致,对检索候选结果进行重排序。在公开基准上的实验表明,G2IA在不同定位阈值下均能持续提升图像到点云地点识别性能,并展现出强大的跨数据集泛化能力。