Accurate multi-sensor calibration is essential for deploying robust perception systems in applications such as autonomous driving, robotics, and intelligent transportation. Existing LiDAR-camera calibration methods often rely on manually placed targets, preliminary parameter estimates, or intensive data preprocessing, limiting their scalability and adaptability in real-world settings. In this work, we propose a fully automatic, targetless, and online calibration framework, CalibRefine, which directly processes raw LiDAR point clouds and camera images. Our approach is divided into four stages: (1) a Common Feature Discriminator that trains on automatically detected objects--using relative positions, appearance embeddings, and semantic classes--to generate reliable LiDAR-camera correspondences, (2) a coarse homography-based calibration, (3) an iterative refinement to incrementally improve alignment as additional data frames become available, and (4) an attention-based refinement that addresses non-planar distortions by leveraging a Vision Transformer and cross-attention mechanisms. Through extensive experiments on two urban traffic datasets, we show that CalibRefine delivers high-precision calibration results with minimal human involvement, outperforming state-of-the-art targetless methods and remaining competitive with, or surpassing, manually tuned baselines. Our findings highlight how robust object-level feature matching, together with iterative and self-supervised attention-based adjustments, enables consistent sensor fusion in complex, real-world conditions without requiring ground-truth calibration matrices or elaborate data preprocessing.
翻译:精确的多传感器标定对于在自动驾驶、机器人和智能交通等应用中部署鲁棒的感知系统至关重要。现有的激光雷达-相机标定方法通常依赖于人工放置的标定物、初步参数估计或密集的数据预处理,这限制了其在真实场景中的可扩展性和适应性。本研究提出了一种全自动、无目标且在线的标定框架CalibRefine,该框架直接处理原始激光雷达点云和相机图像。我们的方法分为四个阶段:(1)通用特征判别器,该判别器基于自动检测到的物体(利用相对位置、外观嵌入和语义类别)进行训练,以生成可靠的激光雷达-相机对应关系;(2)基于单应性的粗标定;(3)迭代优化,随着额外数据帧的可用性逐步改进对齐精度;(4)基于注意力的优化,通过利用Vision Transformer和交叉注意力机制来解决非平面畸变问题。通过在两个城市交通数据集上的大量实验,我们证明CalibRefine能够以最少的人工参与实现高精度标定结果,其性能优于当前最先进的无目标方法,并与手动调优的基线方法相当甚至更优。我们的研究结果表明,鲁棒的物体级特征匹配,结合迭代和自监督的基于注意力的调整,能够在复杂真实世界条件下实现一致的传感器融合,而无需真实标定矩阵或复杂的数据预处理。