LiDARs are widely used for mapping and localization in dynamic environments. However, their high cost limits their widespread adoption. On the other hand, monocular localization in LiDAR maps using inexpensive cameras is a cost-effective alternative for large-scale deployment. Nevertheless, most existing approaches struggle to generalize to new sensor setups and environments, requiring retraining or fine-tuning. In this paper, we present CMRNext, a novel approach for camera-LIDAR matching that is independent of sensor-specific parameters, generalizable, and can be used in the wild for monocular localization in LiDAR maps and camera-LiDAR extrinsic calibration. CMRNext exploits recent advances in deep neural networks for matching cross-modal data and standard geometric techniques for robust pose estimation. We reformulate the point-pixel matching problem as an optical flow estimation problem and solve the Perspective-n-Point problem based on the resulting correspondences to find the relative pose between the camera and the LiDAR point cloud. We extensively evaluate CMRNext on six different robotic platforms, including three publicly available datasets and three in-house robots. Our experimental evaluations demonstrate that CMRNext outperforms existing approaches on both tasks and effectively generalizes to previously unseen environments and sensor setups in a zero-shot manner. We make the code and pre-trained models publicly available at http://cmrnext.cs.uni-freiburg.de .
翻译:激光雷达被广泛用于动态环境中的建图与定位,但其高成本限制了大规模应用。相比之下,利用低成本相机的单目视觉在激光雷达地图中进行定位是一种经济高效的替代方案。然而,现有方法大多难以泛化至新的传感器配置与环境,通常需要重新训练或微调。本文提出CMRNext,一种新颖的相机-激光雷达匹配方法,该方法不依赖于传感器特定参数,具备泛化能力,可应用于野外环境中的激光雷达地图单目定位与相机-激光雷达外部校准。CMRNext结合了深度神经网络在跨模态数据匹配方面的最新进展与标准几何方法以实现鲁棒的位姿估计。我们将点-像素匹配问题重构为光流估计问题,并基于所得的对应关系求解透视n点问题,以确定相机与激光雷达点云之间的相对位姿。我们在六个不同的机器人平台上进行了广泛评估,包括三个公开数据集与三个内部机器人。实验结果表明,CMRNext在两项任务上均优于现有方法,并能以零样本方式有效泛化至未见环境与传感器配置。代码与预训练模型已公开于http://cmrnext.cs.uni-freiburg.de。