We address the task of estimating camera parameters from a set of images depicting a scene. Popular feature-based structure-from-motion (SfM) tools solve this task by incremental reconstruction: they repeat triangulation of sparse 3D points and registration of more camera views to the sparse point cloud. We re-interpret incremental structure-from-motion as an iterated application and refinement of a visual relocalizer, that is, of a method that registers new views to the current state of the reconstruction. This perspective allows us to investigate alternative visual relocalizers that are not rooted in local feature matching. We show that scene coordinate regression, a learning-based relocalization approach, allows us to build implicit, neural scene representations from unposed images. Different from other learning-based reconstruction methods, we do not require pose priors nor sequential inputs, and we optimize efficiently over thousands of images. In many cases, our method, ACE0, estimates camera poses with an accuracy close to feature-based SfM, as demonstrated by novel view synthesis. Project page: https://nianticlabs.github.io/acezero/
翻译:本文研究从一组描绘场景的图像中估计相机参数的任务。基于特征的流行运动恢复结构(SfM)工具通过增量重建解决此任务:它们重复执行稀疏三维点的三角测量,并将更多相机视图配准到稀疏点云。我们将增量式运动恢复结构重新阐释为视觉重定位器的迭代应用与优化——即一种将新视图配准到当前重建状态的方法。这一视角使我们能够研究不依赖于局部特征匹配的替代性视觉重定位器。我们证明,基于学习的重定位方法——场景坐标回归——能够从未标定姿态的图像构建隐式的神经场景表示。与其他基于学习的重建方法不同,我们的方法既不需要姿态先验,也不要求序列化输入,并且能在数千张图像上高效优化。在多数情况下,我们的方法ACE0能够实现接近基于特征SfM的相机姿态估计精度,这通过新颖视角合成得到了验证。项目页面:https://nianticlabs.github.io/acezero/