Most learning-based camera-LiDAR calibration methods treat each camera-LiDAR pair independently, ignoring the rigid geometric coupling in multi-camera platforms. As a result, per-camera estimates may be individually accurate yet inconsistent at the system level. We present a two-stage framework for joint multi-camera LiDAR extrinsic calibration that combines learned pairwise matching with geometric refinement. First, CMRNext is applied independently to each camera to produce initial extrinsic estimates and dense 2D-3D correspondences. These predictions are then jointly refined through a multi-frame bundle adjustment with reprojection, per-camera prior, and relative-pose prior terms. This approach converts pairwise predictions into a globally consistent multi-camera calibration. Experiments on KITTI (in-domain for CMRNext) and Walkley (out-of-domain) datasets show improved per-camera accuracy and inter-camera consistency. On KITTI, the method achieves 0.89 cm translation error and 0.038 rotation error. On Walkley, it reduces translation error from 108.6 cm to 3.1 cm, highlighting the benefit of explicit multi-camera coupling when single-camera predictions are less reliable.
翻译:大多数基于学习的相机-激光雷达标定方法独立处理每个相机-激光雷达对,忽略了多相机平台中刚体几何耦合关系。因此,单个相机的估计值可能各自精确,但在系统层面存在不一致性。我们提出一种两阶段框架,用于联合多相机激光雷达外参标定,该框架将基于学习的成对匹配与几何精化相结合。首先,对每个相机独立应用CMRNext网络,生成初始外参估计和密集的2D-3D对应关系。随后通过引入重投影约束、单相机先验约束和相对位姿先验约束的多帧光束法平差,对这些预测结果进行联合精化。该方法将成对预测转化为全局一致的多相机标定结果。在KITTI(CMRNext域内数据集)和Walkley(域外数据集)上的实验表明,该方法提升了单相机精度和相机间一致性。在KITTI上,该方法实现了0.89厘米的平移误差和0.038°的旋转误差;在Walkley上,平移误差从108.6厘米降至3.1厘米,凸显了当单相机预测可靠性较低时显式多相机耦合的优势。