Real-time multi-camera 3D reconstruction is crucial for 3D perception, immersive interaction, and robotics. Existing methods struggle with multi-view fusion, camera extrinsic uncertainty, and scalability for large camera setups. We propose SPARK, a self-calibrating real-time multi-camera point cloud reconstruction framework that jointly handles point cloud fusion and extrinsic uncertainty. SPARK consists of: (1) a geometry-aware online extrinsic estimation module leveraging multi-view priors and enforcing cross-view and temporal consistency for stable self-calibration, and (2) a confidence-driven point cloud fusion strategy modeling depth reliability and visibility at pixel and point levels to suppress noise and view-dependent inconsistencies. By performing frame-wise fusion without accumulation, SPARK produces stable point clouds in dynamic scenes while scaling linearly with the number of cameras. Extensive experiments on real-world multi-camera systems show that SPARK outperforms existing approaches in extrinsic accuracy, geometric consistency, temporal stability, and real-time performance, demonstrating its effectiveness and scalability for large-scale multi-camera 3D reconstruction.
翻译:实时多相机三维重建对于三维感知、沉浸式交互与机器人技术至关重要。现有方法在多视角融合、相机外参不确定性以及大规模相机系统可扩展性方面面临挑战。本文提出SPARK,一种自标定的实时多相机点云重建框架,能够协同处理点云融合与外参不确定性。SPARK包含两个核心组件:(1) 几何感知的在线外参估计模块,该模块利用多视角先验信息,并通过强制跨视角与时序一致性实现稳定的自标定;(2) 置信度驱动的点云融合策略,该策略在像素级与点级对深度可靠性与可见性进行建模,以抑制噪声和视角相关的不一致性。通过逐帧融合而非累积的方式,SPARK能够在动态场景中生成稳定的点云,同时其计算复杂度与相机数量呈线性增长。在真实多相机系统上的大量实验表明,SPARK在外参精度、几何一致性、时序稳定性以及实时性能方面均优于现有方法,验证了其在大规模多相机三维重建任务中的有效性与可扩展性。