Structure-from-Motion -- the process of simultaneously estimating camera poses and 3D scene structure from a collection of images -- remains a central challenge in computer vision, with many open problems yet to be solved. Recent advances in feedforward 3D reconstruction have made significant strides in overcoming persistent failure cases of classical SfM methods, particularly in scenarios characterized by low texture, limited overlap, and symmetries. However, while feedforward approaches excel in these challenging conditions, they often face limitations regarding scalability, accuracy, or robustness, and typically fall short of classical methods in standard reconstruction settings. In this work, we systematically analyze these limitations and propose a new Structure-from-Motion pipeline by combining the respective strengths of classical and feedforward methods. Extensive experiments across multiple datasets show the benefits of our approach, achieving state-of-the-art results across a wide range of scenarios. We share our system as an open-source implementation at https://github.com/colmap/gluemap.
翻译:运动恢复结构——即从图像集合中同时估计相机姿态和三维场景结构的过程——仍然是计算机视觉中的核心挑战,存在许多尚未解决的问题。前馈三维重建的最新进展在克服经典SfM方法长期存在的失败情形方面取得了显著突破,特别是在低纹理、有限重叠和对称性等场景中。然而,尽管前馈方法在这些困难条件下表现出色,但它们在可扩展性、精确性或鲁棒性方面常面临限制,并且在标准重建设置中通常不如经典方法。本文系统分析了这些局限性,并通过结合经典方法与前馈方法的各自优势,提出了一种新的运动恢复结构流程。在多个数据集上进行的广泛实验表明了我们方法的优势,在广泛场景中实现了最先进的结果。我们以开源实现形式共享该系统,网址为https://github.com/colmap/gluemap。