Place recognition is one of the most crucial modules for autonomous vehicles to identify places that were previously visited in GPS-invalid environments. Sensor fusion is considered an effective method to overcome the weaknesses of individual sensors. In recent years, multimodal place recognition fusing information from multiple sensors has gathered increasing attention. However, most existing multimodal place recognition methods only use limited field-of-view camera images, which leads to an imbalance between features from different modalities and limits the effectiveness of sensor fusion. In this paper, we present a novel neural network named LCPR for robust multimodal place recognition, which fuses LiDAR point clouds with multi-view RGB images to generate discriminative and yaw-rotation invariant representations of the environment. A multi-scale attention-based fusion module is proposed to fully exploit the panoramic views from different modalities of the environment and their correlations. We evaluate our method on the nuScenes dataset, and the experimental results show that our method can effectively utilize multi-view camera and LiDAR data to improve the place recognition performance while maintaining strong robustness to viewpoint changes. Our open-source code and pre-trained models are available at https://github.com/ZhouZijie77/LCPR .
翻译:地点识别是自动驾驶车辆在GPS失效环境中识别先前访问过位置的关键模块之一。传感器融合被认为是克服单一传感器局限性的有效方法。近年来,融合多传感器信息的多模态地点识别受到越来越多的关注。然而,现有大多数多模态地点识别方法仅使用有限视场角的相机图像,导致不同模态特征之间存在不平衡,限制了传感器融合的效果。本文提出了一种名为LCPR的新型神经网络用于鲁棒的多模态地点识别,该网络融合激光雷达点云与多视角RGB图像,生成具有判别性和偏航旋转不变性的环境表示。我们设计了一种基于多尺度注意力的融合模块,以充分利用来自不同模态的环境全景视图及其相关性。在nuScenes数据集上的实验结果表明,该方法能够有效利用多视角相机和激光雷达数据提升地点识别性能,同时对视角变化保持强鲁棒性。我们的开源代码和预训练模型可从 https://github.com/ZhouZijie77/LCPR 获取。