Perceptual aliasing and weak textures pose significant challenges to the task of place recognition, hindering the performance of Simultaneous Localization and Mapping (SLAM) systems. This paper presents a novel model, called UMF (standing for Unifying Local and Global Multimodal Features) that 1) leverages multi-modality by cross-attention blocks between vision and LiDAR features, and 2) includes a re-ranking stage that re-orders based on local feature matching the top-k candidates retrieved using a global representation. Our experiments, particularly on sequences captured on a planetary-analogous environment, show that UMF outperforms significantly previous baselines in those challenging aliased environments. Since our work aims to enhance the reliability of SLAM in all situations, we also explore its performance on the widely used RobotCar dataset, for broader applicability. Code and models are available at https://github.com/DLR-RM/UMF
翻译:感知混淆与弱纹理对地点识别任务构成重大挑战,限制了同步定位与地图构建(SLAM)系统的性能。本文提出一种新型模型UMF(Unifying Local and Global Multimodal Features),该模型:1)通过视觉与激光雷达特征之间的交叉注意力模块利用多模态信息;2)包含基于局部特征匹配的重排序阶段,对基于全局表示检索到的top-k候选结果进行重新排序。我们在行星模拟环境序列上的实验表明,UMF在这些具有挑战性的混淆环境中显著优于先前基线方法。为提升SLAM在所有场景中的可靠性,我们还在广泛使用的RobotCar数据集上探索了其性能,以增强泛化能力。代码与模型已开源在https://github.com/DLR-RM/UMF。