Although significant progress has been made, achieving place recognition in environments with perspective changes, seasonal variations, and scene transformations remains challenging. Relying solely on perception information from a single sensor is insufficient to address these issues. Recognizing the complementarity between cameras and LiDAR, multi-modal fusion methods have attracted attention. To address the information waste in existing multi-modal fusion works, this paper introduces a novel three-channel place descriptor, which consists of a cascade of image, point cloud, and fusion branches. Specifically, the fusion-based branch employs a dual-stage pipeline, leveraging the correlation between the two modalities with latent contacts, thereby facilitating information interaction and fusion. Extensive experiments on the KITTI, NCLT, USVInland, and the campus dataset demonstrate that the proposed place descriptor stands as the state-of-the-art approach, confirming its robustness and generality in challenging scenarios.
翻译:尽管已取得显著进展,但在存在视角变化、季节变换及场景变迁的环境中进行地点识别仍具挑战性。仅依赖单一传感器的感知信息不足以应对这些问题。鉴于相机与激光雷达的互补性,多模态融合方法已引起关注。为解决现有融合工作中存在的信息浪费问题,本文提出一种新颖的三通道地点描述符,该描述符由图像、点云及融合分支级联构成。具体而言,基于融合的分支采用双阶段流水线,借助潜在接触利用两种模态间的相关性,从而促进信息交互与融合。在KITTI、NCLT、USVInland及校园数据集上的大量实验表明,所提出的地点描述符达到了最先进水平,验证了其在挑战性场景中的鲁棒性与泛化能力。