Pose-free neural radiance fields (NeRF) aim to train NeRF with unposed multi-view images and it has achieved very impressive success in recent years. Most existing works share the pipeline of training a coarse pose estimator with rendered images at first, followed by a joint optimization of estimated poses and neural radiance field. However, as the pose estimator is trained with only rendered images, the pose estimation is usually biased or inaccurate for real images due to the domain gap between real images and rendered images, leading to poor robustness for the pose estimation of real images and further local minima in joint optimization. We design IR-NeRF, an innovative pose-free NeRF that introduces implicit pose regularization to refine pose estimator with unposed real images and improve the robustness of the pose estimation for real images. With a collection of 2D images of a specific scene, IR-NeRF constructs a scene codebook that stores scene features and captures the scene-specific pose distribution implicitly as priors. Thus, the robustness of pose estimation can be promoted with the scene priors according to the rationale that a 2D real image can be well reconstructed from the scene codebook only when its estimated pose lies within the pose distribution. Extensive experiments show that IR-NeRF achieves superior novel view synthesis and outperforms the state-of-the-art consistently across multiple synthetic and real datasets.
翻译:无位姿神经辐射场旨在利用未标注位姿的多视角图像训练NeRF,近年来取得了显著成功。现有方法多采用先利用渲染图像训练粗略位姿估计器,再联合优化估计算位与神经辐射场的流程。然而,由于位姿估计器仅通过渲染图像训练,真实图像与渲染图像之间的域差异常导致位姿估计存在偏差或不精确,从而降低真实图像位姿估计的鲁棒性,并进一步导致联合优化陷入局部极小值。我们设计了IR-NeRF这一创新性无位姿神经辐射场方法,通过引入隐式位姿正则化,利用未标注位姿的真实图像优化位姿估计器,提升真实图像位姿估计的鲁棒性。针对特定场景的二维图像集合,IR-NeRF构建了存储场景特征的场景码本,并隐式捕获场景特有的位姿分布作为先验信息。基于以下原理:仅当二维真实图像的估计位姿处于位姿分布范围内时,该图像才能从场景码本中有效重建,因此场景先验可提升位姿估计的鲁棒性。大量实验表明,IR-NeRF在多个合成与真实数据集上均实现了优于现有方法的新视角合成效果,持续达到业界领先水平。