LiDAR-camera extrinsic calibration is essential for multi-modal data fusion in robotic perception systems. However, existing approaches typically rely on handcrafted calibration targets (e.g., checkerboards) or specific, static scene types, limiting their adaptability and deployment in real-world autonomous and robotic applications. This article presents the first self-supervised LiDAR-camera extrinsic calibration network that operates in an online fashion and eliminates the need for specific calibration targets. We first identify a significant generalization degradation problem in prior methods, caused by the conventional single-sided data augmentation strategy. To overcome this limitation, we propose a novel double-sided data augmentation technique that generates multi-perspective camera views using estimated depth maps, thereby enhancing robustness and diversity during training. Built upon this augmentation strategy, we design a dual-path, self-supervised calibration framework that reduces the dependence on high-precision ground truth labels and supports fully adaptive online calibration. Furthermore, to improve cross-modal feature association, we replace the traditional dual-branch feature extraction design with a difference map construction process that explicitly correlates LiDAR and camera features. This not only enhances calibration accuracy but also reduces model complexity. Extensive experiments conducted on five public benchmark datasets, as well as our own recorded dataset, demonstrate that the proposed method significantly outperforms existing approaches in terms of generalizability.
翻译:激光雷达-相机外参标定对于机器人感知系统中的多模态数据融合至关重要。然而,现有方法通常依赖于手工制作的标定目标(例如棋盘格)或特定的静态场景类型,这限制了其在现实世界自主与机器人应用中的适应性和部署能力。本文提出了首个以在线方式运行、无需特定标定目标的自监督激光雷达-相机外参标定网络。我们首先指出了先前方法中因传统单侧数据增强策略导致的显著泛化性能退化问题。为克服这一局限,我们提出了一种新颖的双侧数据增强技术,该技术利用估计的深度图生成多视角相机视图,从而增强了训练过程中的鲁棒性与多样性。基于此增强策略,我们设计了一个双路径、自监督的标定框架,降低了对高精度地面真值标签的依赖,并支持完全自适应的在线标定。此外,为改善跨模态特征关联,我们用显式关联激光雷达与相机特征的差异图构建过程替代了传统的双分支特征提取设计。这不仅提高了标定精度,还降低了模型复杂度。在五个公开基准数据集以及我们自行录制的数据集上进行的大量实验表明,所提方法在泛化能力方面显著优于现有方法。