Prior point cloud provides 3D environmental context, which enhances the capabilities of monocular camera in downstream vision tasks, such as 3D object detection, via data fusion. However, the absence of accurate and automated registration methods for estimating camera extrinsic parameters in roadside scene point clouds notably constrains the potential applications of roadside cameras. This paper proposes a novel approach for the automatic registration between prior point clouds and images from roadside scenes. The main idea involves rendering photorealistic grayscale views taken at specific perspectives from the prior point cloud with the help of their features like RGB or intensity values. These generated views can reduce the modality differences between images and prior point clouds, thereby improve the robustness and accuracy of the registration results. Particularly, we specify an efficient algorithm, named neighbor rendering, for the rendering process. Then we introduce a method for automatically estimating the initial guess using only rough guesses of camera's position. At last, we propose a procedure for iteratively refining the extrinsic parameters by minimizing the reprojection error for line features extracted from both generated and camera images using Segment Anything Model (SAM). We assess our method using a self-collected dataset, comprising eight cameras strategically positioned throughout the university campus. Experiments demonstrate our method's capability to automatically align prior point cloud with roadside camera image, achieving a rotation accuracy of 0.202 degrees and a translation precision of 0.079m. Furthermore, we validate our approach's effectiveness in visual applications by substantially improving monocular 3D object detection performance.
翻译:先验点云提供了3D环境上下文,通过数据融合增强了单目相机在下游视觉任务(如3D目标检测)中的能力。然而,路边场景点云中相机外参的精确自动配准方法缺失,显著限制了路边相机的潜在应用。本文提出了一种新颖的方法,用于先验点云与路边场景图像之间的自动配准。核心思想是借助先验点云的特征(如RGB或强度值),从特定视角渲染出逼真的灰度视图。这些生成的视图能够缩小图像与先验点云之间的模态差异,从而提高配准结果的鲁棒性和准确性。具体而言,我们提出了一种称为邻居渲染的高效渲染算法。随后,我们引入了一种仅利用相机位置的粗略估计即可自动估算初始猜测值的方法。最后,我们提出了一种迭代优化外参的流程,通过最小化使用Segment Anything Model(SAM)从生成图像和相机图像中提取的线特征的重投影误差来实现。我们使用自采集数据集评估了该方法,该数据集包含分布在整个大学校园内的八个相机。实验表明,该方法能够自动将先验点云与路边相机图像对齐,旋转精度达到0.202度,平移精度达到0.079米。此外,我们通过显著提升单目3D目标检测性能,验证了该方法在视觉应用中的有效性。