Place recognition is an important technique for autonomous cars to achieve full autonomy since it can provide an initial guess to online localization algorithms. Although current methods based on images or point clouds have achieved satisfactory performance, localizing the images on a large-scale point cloud map remains a fairly unexplored problem. This cross-modal matching task is challenging due to the difficulty in extracting consistent descriptors from images and point clouds. In this paper, we propose the I2P-Rec method to solve the problem by transforming the cross-modal data into the same modality. Specifically, we leverage on the recent success of depth estimation networks to recover point clouds from images. We then project the point clouds into Bird's Eye View (BEV) images. Using the BEV image as an intermediate representation, we extract global features with a Convolutional Neural Network followed by a NetVLAD layer to perform matching. The experimental results evaluated on the KITTI dataset show that, with only a small set of training data, I2P-Rec achieves recall rates at Top-1\% over 80\% and 90\%, when localizing monocular and stereo images on point cloud maps, respectively. We further evaluate I2P-Rec on a 1 km trajectory dataset collected by an autonomous logistics car and show that I2P-Rec can generalize well to previously unseen environments.
翻译:地点识别是自动驾驶汽车实现完全自主的重要技术,因为它能为在线定位算法提供初始估计。尽管当前基于图像或点云的方法已取得令人满意的性能,但在大规模点云地图上定位图像仍是一个相当未被探索的问题。这种跨模态匹配任务具有挑战性,原因在于难以从图像和点云中提取一致的描述符。本文提出 I2P-Rec 方法,通过将跨模态数据转换为同一模态来解决此问题。具体而言,我们利用深度估计网络的最新成功,从图像中恢复点云,然后将点云投影到鸟瞰图中。以 BEV 图像作为中间表示,我们通过卷积神经网络提取全局特征,并紧跟 NetVLAD 层进行匹配。在 KITTI 数据集上的实验结果表明,仅用少量训练数据,I2P-Rec 在将单目和立体图像定位到点云地图上时,分别达到了超过 80% 和 90% 的 Top-1% 召回率。我们进一步在自动驾驶物流车采集的 1 公里轨迹数据集上评估 I2P-Rec,表明它能很好地泛化到未见过的环境。