CalibRefine：基于深度学习的在线无目标激光雷达-相机自动标定与迭代注意力驱动后优化 (CalibRefine: Deep Learning-Based Online Automatic Targetless LiDAR-Camera Calibration with Iterative and Attention-Driven Post-Refinement)

Accurate multi-sensor calibration is essential for deploying robust perception systems in applications such as autonomous driving, robotics, and intelligent transportation. Existing LiDAR-camera calibration methods often rely on manually placed targets, preliminary parameter estimates, or intensive data preprocessing, limiting their scalability and adaptability in real-world settings. In this work, we propose a fully automatic, targetless, and online calibration framework, CalibRefine, which directly processes raw LiDAR point clouds and camera images. Our approach is divided into four stages: (1) a Common Feature Discriminator that trains on automatically detected objects--using relative positions, appearance embeddings, and semantic classes--to generate reliable LiDAR-camera correspondences, (2) a coarse homography-based calibration, (3) an iterative refinement to incrementally improve alignment as additional data frames become available, and (4) an attention-based refinement that addresses non-planar distortions by leveraging a Vision Transformer and cross-attention mechanisms. Through extensive experiments on two urban traffic datasets, we show that CalibRefine delivers high-precision calibration results with minimal human involvement, outperforming state-of-the-art targetless methods and remaining competitive with, or surpassing, manually tuned baselines. Our findings highlight how robust object-level feature matching, together with iterative and self-supervised attention-based adjustments, enables consistent sensor fusion in complex, real-world conditions without requiring ground-truth calibration matrices or elaborate data preprocessing.

翻译：精确的多传感器标定对于在自动驾驶、机器人和智能交通等应用中部署鲁棒的感知系统至关重要。现有的激光雷达-相机标定方法通常依赖于人工放置的标定物、初步参数估计或密集的数据预处理，这限制了其在真实场景中的可扩展性和适应性。本研究提出了一种全自动、无目标且在线的标定框架CalibRefine，该框架直接处理原始激光雷达点云和相机图像。我们的方法分为四个阶段：（1）通用特征判别器，该判别器基于自动检测到的物体（利用相对位置、外观嵌入和语义类别）进行训练，以生成可靠的激光雷达-相机对应关系；（2）基于单应性的粗标定；（3）迭代优化，随着额外数据帧的可用性逐步改进对齐精度；（4）基于注意力的优化，通过利用Vision Transformer和交叉注意力机制来解决非平面畸变问题。通过在两个城市交通数据集上的大量实验，我们证明CalibRefine能够以最少的人工参与实现高精度标定结果，其性能优于当前最先进的无目标方法，并与手动调优的基线方法相当甚至更优。我们的研究结果表明，鲁棒的物体级特征匹配，结合迭代和自监督的基于注意力的调整，能够在复杂真实世界条件下实现一致的传感器融合，而无需真实标定矩阵或复杂的数据预处理。