Place recognition is a fundamental task for robotic application, allowing robots to perform loop closure detection within simultaneous localization and mapping (SLAM), and achieve relocalization on prior maps. Current range image-based networks use single-column convolution to maintain feature invariance to shifts in image columns caused by LiDAR viewpoint change.However, this raises the issues such as "restricted receptive fields" and "excessive focus on local regions", degrading the performance of networks. To address the aforementioned issues, we propose a lightweight circular convolutional Transformer network denoted as CCTNet, which boosts performance by capturing structural information in point clouds and facilitating crossdimensional interaction of spatial and channel information. Initially, a Circular Convolution Module (CCM) is introduced, expanding the network's perceptual field while maintaining feature consistency across varying LiDAR perspectives. Then, a Range Transformer Module (RTM) is proposed, which enhances place recognition accuracy in scenarios with movable objects by employing a combination of channel and spatial attention mechanisms. Furthermore, we propose an Overlap-based loss function, transforming the place recognition task from a binary loop closure classification into a regression problem linked to the overlap between LiDAR frames. Through extensive experiments on the KITTI and Ford Campus datasets, CCTNet surpasses comparable methods, achieving Recall@1 of 0.924 and 0.965, and Recall@1% of 0.990 and 0.993 on the test set, showcasing a superior performance. Results on the selfcollected dataset further demonstrate the proposed method's potential for practical implementation in complex scenarios to handle movable objects, showing improved generalization in various datasets.
翻译:场景识别是机器人应用中的一项基础任务,使机器人能够在同步定位与建图(SLAM)中执行回环检测,并在先验地图上实现重定位。当前基于距离图像的网络采用单列卷积来保持特征对LiDAR视角变化引起的图像列偏移的不变性。然而,这引发了“受限感受野”和“过度关注局部区域”等问题,导致网络性能下降。为解决上述问题,我们提出了一种轻量级环形卷积Transformer网络CCTNet,该网络通过捕获点云中的结构信息并促进空间与通道信息的跨维度交互来提升性能。首先,我们引入了环形卷积模块(CCM),在保持不同LiDAR视角下特征一致性的同时扩展了网络的感知范围。随后,提出了距离Transformer模块(RTM),该模块通过结合通道与空间注意力机制,提升了存在可移动物体场景下的场景识别精度。此外,我们提出了一种基于重叠度的损失函数,将场景识别任务从二值回环分类转化为与LiDAR帧间重叠度相关的回归问题。通过在KITTI和福特校园数据集上的大量实验,CCTNet超越了同类方法,在测试集上分别实现了0.924和0.965的Recall@1分数,以及0.990和0.993的Recall@1%分数,展现出卓越性能。在自采集数据集上的结果进一步证明了该方法在复杂场景中处理可移动物体的实际应用潜力,并在不同数据集中表现出更强的泛化能力。