Image to point cloud global localization is crucial for robot navigation in GNSS-denied environments and has become increasingly important for multi-robot map fusion and urban asset management. The modality gap between images and point clouds poses significant challenges for cross-modality fusion. Current cross-modality global localization solutions either require modality unification, which leads to information loss, or rely on engineered training schemes to encode multi-modality features, which often lack feature alignment and relation consistency. To address these limitations, we propose, SaliencyI2PLoc, a novel contrastive learning based architecture that fuses the saliency map into feature aggregation and maintains the feature relation consistency on multi-manifold spaces. To alleviate the pre-process of data mining, the contrastive learning framework is applied which efficiently achieves cross-modality feature mapping. The context saliency-guided local feature aggregation module is designed, which fully leverages the contribution of the stationary information in the scene generating a more representative global feature. Furthermore, to enhance the cross-modality feature alignment during contrastive learning, the consistency of relative relationships between samples in different manifold spaces is also taken into account. Experiments conducted on urban and highway scenario datasets demonstrate the effectiveness and robustness of our method. Specifically, our method achieves a Recall@1 of 78.92% and a Recall@20 of 97.59% on the urban scenario evaluation dataset, showing an improvement of 37.35% and 18.07%, compared to the baseline method. This demonstrates that our architecture efficiently fuses images and point clouds and represents a significant step forward in cross-modality global localization. The project page and code will be released.
翻译:图像到点云的全局定位对于机器人在GNSS拒止环境中的导航至关重要,并且在多机器人地图融合与城市资产管理中日益重要。图像与点云之间的模态差异给跨模态融合带来了重大挑战。现有的跨模态全局定位解决方案要么需要进行模态统一(导致信息损失),要么依赖于人工设计的训练方案来编码多模态特征,而这些方案通常缺乏特征对齐与关系一致性。为解决这些局限性,我们提出了SaliencyI2PLoc,一种基于对比学习的新型架构,它将显著性图融入特征聚合过程,并在多流形空间中保持特征关系的一致性。为减轻数据挖掘的预处理负担,该架构应用了对比学习框架,高效实现了跨模态特征映射。我们设计了上下文显著性引导的局部特征聚合模块,该模块充分利用场景中静态信息的贡献,生成更具代表性的全局特征。此外,为增强对比学习过程中的跨模态特征对齐,我们还考虑了不同流形空间中样本间相对关系的一致性。在城市和高速公路场景数据集上进行的实验证明了我们方法的有效性和鲁棒性。具体而言,在城市场景评估数据集上,我们的方法实现了78.92%的Recall@1和97.59%的Recall@20,与基线方法相比分别提升了37.35%和18.07%。这表明我们的架构能够高效融合图像与点云,是跨模态全局定位领域的重要进展。项目页面与代码将予以公开。