A critical obstacle preventing NeRF models from being deployed broadly in the wild is their reliance on accurate camera poses. Consequently, there is growing interest in extending NeRF models to jointly optimize camera poses and scene representation, which offers an alternative to off-the-shelf SfM pipelines which have well-understood failure modes. Existing approaches for unposed NeRF operate under limited assumptions, such as a prior pose distribution or coarse pose initialization, making them less effective in a general setting. In this work, we propose a novel approach, LU-NeRF, that jointly estimates camera poses and neural radiance fields with relaxed assumptions on pose configuration. Our approach operates in a local-to-global manner, where we first optimize over local subsets of the data, dubbed mini-scenes. LU-NeRF estimates local pose and geometry for this challenging few-shot task. The mini-scene poses are brought into a global reference frame through a robust pose synchronization step, where a final global optimization of pose and scene can be performed. We show our LU-NeRF pipeline outperforms prior attempts at unposed NeRF without making restrictive assumptions on the pose prior. This allows us to operate in the general SE(3) pose setting, unlike the baselines. Our results also indicate our model can be complementary to feature-based SfM pipelines as it compares favorably to COLMAP on low-texture and low-resolution images.
翻译:阻碍NeRF模型在开放环境中广泛部署的一个关键障碍是其对精确相机位姿的依赖。因此,扩展NeRF模型以联合优化相机位姿与场景表示的方法正受到越来越多的关注,这为具有已知失效模式的现成SfM流水线提供了一种替代方案。现有的无位姿NeRF方法在有限假设条件下运行,例如位姿先验分布或粗略位姿初始化,这使得它们在通用场景中效果不佳。本文提出了一种新颖方法LU-NeRF,该模型在放宽位姿配置假设的条件下联合估计相机位姿与神经辐射场。我们的方法采用从局部到全局的流程:首先对数据局部子集(称为微场景)进行优化,LU-NeRF通过解决这一具有挑战性的小样本任务来估计局部位姿与几何结构。随后通过鲁棒的位姿同步步骤将微场景位姿转换至全局参考坐标系,最终执行全局位姿与场景联合优化。实验表明,我们的LU-NeRF流水线在不依赖位姿先验的严格假设下,性能优于先前的无位姿NeRF方法。这使得我们能够在通用的SE(3)位姿空间中运行,而这是基线方法无法实现的。此外,我们的结果显示该模型可与基于特征的SfM流水线互补——在低纹理和低分辨率图像上其性能优于COLMAP。