Real-time multi-camera 3D reconstruction is a key foundation for immersive media, remote interaction and spatial computing. While synchronized camera arrays are widely adopted, achieving geometrically consistent and scalable real-time reconstruction remains challenging. A key challenge is the close linkage among extrinsic calibration, multi-view fusion and global optimization, which causes fluctuating reconstruction results, cumulative errors and poor system expandability. We propose a decoupled framework for calibration and stateless real-time multi-view point cloud fusion (FUSE-Flow), a framework with two collaborative components: geometry-aligned multi-view extrinsic calibration (GMAC) and reliability-guided multi-view point cloud fusion (FUSE). This split design avoids conflicting optimization objectives for targeted improvement. The GMAC module refines camera extrinsics via geometric constraints and multi-view reconstruction transformers, enabling accurate sparse-view calibration without calibration targets, dense images or global bundle adjustment. The FUSE module integrates confidence weighting and adaptive spatial hashing for stateless fusion, ensuring linear time and memory consumption. The two modules mutually reinforce each other: accurate camera poses boost fusion accuracy, and confidence-aware fusion corrects calibration biases. Validated on public datasets and real camera setups, FUSE-Flow outperforms mainstream real-time reconstruction methods in visual effect, dynamic stability and scalability, providing a practical solution for large-scale real-time 3D reconstruction.
翻译:实时多摄像头三维重建是沉浸式媒体、远程交互和空间计算的关键基础。尽管同步相机阵列已被广泛采用,但实现几何一致且可扩展的实时重建仍面临挑战。其中一个关键挑战在于外参标定、多视角融合与全局优化之间存在紧密耦合,这导致重建结果波动、累积误差累积以及系统可扩展性差。我们提出了一种用于标定与无状态实时多视角点云融合的解耦框架(FUSE-Flow),该框架包含两个协同模块:几何对齐的多视角外参标定模块(GMAC)和可靠性引导的多视角点云融合模块(FUSE)。这种分离设计避免了优化目标冲突,从而实现针对性改进。GMAC模块通过几何约束和多视角重建变换器精化相机外参,无需标定靶、密集图像或全局光束法平差即可实现精确的稀疏视角标定。FUSE模块融合置信度加权与自适应空间哈希实现无状态融合,确保线性时间和内存消耗。两个模块相互增强:精确的相机位姿提升融合精度,而置信度感知融合校正标定偏差。在公开数据集和真实相机系统上的验证表明,FUSE-Flow在视觉效果、动态稳定性和可扩展性方面均优于主流实时重建方法,为大规模实时三维重建提供了实用解决方案。