LiDAR-based place recognition (LPR) is one of the most crucial components of autonomous vehicles to identify previously visited places in GPS-denied environments. Most existing LPR methods use mundane representations of the input point cloud without considering different views, which may not fully exploit the information from LiDAR sensors. In this paper, we propose a cross-view transformer-based network, dubbed CVTNet, to fuse the range image views (RIVs) and bird's eye views (BEVs) generated from the LiDAR data. It extracts correlations within the views themselves using intra-transformers and between the two different views using inter-transformers. Based on that, our proposed CVTNet generates a yaw-angle-invariant global descriptor for each laser scan end-to-end online and retrieves previously seen places by descriptor matching between the current query scan and the pre-built database. We evaluate our approach on three datasets collected with different sensor setups and environmental conditions. The experimental results show that our method outperforms the state-of-the-art LPR methods with strong robustness to viewpoint changes and long-time spans. Furthermore, our approach has a good real-time performance that can run faster than the typical LiDAR frame rate. The implementation of our method is released as open source at: https://github.com/BIT-MJY/CVTNet.
翻译:基于激光雷达的地点识别(LPR)是自动驾驶系统在无GPS环境中识别已访问地点的关键组成部分。现有LPR方法大多采用输入点云的单一表征,未考虑不同视角,因此难以充分利用激光雷达传感器信息。本文提出一种基于跨视角Transformer的网络,称为CVTNet,用于融合激光雷达数据生成的深度图像视图(RIVs)和鸟瞰视图(BEVs)。该网络通过内部Transformer提取各视图内部的关联性,并利用交叉Transformer捕获两种不同视图之间的相关性。在此基础上,提出的CVTNet为单次激光扫描在线端到端生成偏航角不变的全局描述子,并通过当前查询扫描与预建数据库之间的描述子匹配来识别已访问场景。我们在三种具有不同传感器配置及环境条件的数据集上评估所提方法。实验结果表明,本方法在视角变化和长时间跨度下具有强鲁棒性,性能优于现有最优LPR方法。此外,本方法具有良好的实时性能,运行速度可超过典型激光雷达帧率。本方法实现已开源至:https://github.com/BIT-MJY/CVTNet。