Structure from Motion (SfM) estimates camera poses and reconstructs point clouds, forming a foundation for various tasks. However, applying SfM to driving scenes captured by multi-camera systems presents significant difficulties, including unreliable pose estimation, excessive outliers in road surface reconstruction, and low reconstruction efficiency. To address these limitations, we propose a Multi-camera Reconstruction and Aggregation Structure-from-Motion (MRASfM) framework specifically designed for driving scenes. MRASfM enhances the reliability of camera pose estimation by leveraging the fixed spatial relationships within the multi-camera system during the registration process. To improve the quality of road surface reconstruction, our framework employs a plane model to effectively remove erroneous points from the triangulated road surface. Moreover, treating the multi-camera set as a single unit in Bundle Adjustment (BA) helps reduce optimization variables to boost efficiency. In addition, MRASfM achieves multi-scene aggregation through scene association and assembly modules in a coarse-to-fine fashion. We deployed multi-camera systems on actual vehicles to validate the generalizability of MRASfM across various scenes and its robustness in challenging conditions through real-world applications. Furthermore, large-scale validation results on public datasets show the state-of-the-art performance of MRASfM, achieving 0.124 absolute pose error on the nuScenes dataset.
翻译:运动恢复结构(SfM)通过估计相机位姿并重建点云,为多种任务奠定基础。然而,将SfM应用于多相机系统捕获的驾驶场景时面临显著挑战,包括位姿估计不可靠、路面重建中存在大量异常点以及重建效率低下。为克服这些局限,我们提出一种专为驾驶场景设计的**多相机重建与聚合运动恢复结构(MRASfM)**框架。MRASfM在配准过程中利用多相机系统内固定的空间关系,增强了相机位姿估计的可靠性。为提升路面重建质量,本框架采用平面模型有效剔除三角化路面中的错误点。此外,在光束法平差(BA)中将多相机系统视为单一单元,有助于减少优化变量以提升效率。同时,MRASfM通过场景关联与组装模块以由粗到精的方式实现多场景聚合。我们在实车上部署多相机系统,通过实际应用验证了MRASfM在不同场景下的泛化能力及其在挑战性条件下的鲁棒性。此外,在公开数据集上的大规模验证结果表明MRASfM达到先进性能,在nuScenes数据集上实现了0.124的绝对位姿误差。