Neural radiance field (NeRF) has achieved impressive results in high-quality 3D scene reconstruction. However, NeRF heavily relies on precise camera poses. While recent works like BARF have introduced camera pose optimization within NeRF, their applicability is limited to simple trajectory scenes. Existing methods struggle while tackling complex trajectories involving large rotations. To address this limitation, we propose CT-NeRF, an incremental reconstruction optimization pipeline using only RGB images without pose and depth input. In this pipeline, we first propose a local-global bundle adjustment under a pose graph connecting neighboring frames to enforce the consistency between poses to escape the local minima caused by only pose consistency with the scene structure. Further, we instantiate the consistency between poses as a reprojected geometric image distance constraint resulting from pixel-level correspondences between input image pairs. Through the incremental reconstruction, CT-NeRF enables the recovery of both camera poses and scene structure and is capable of handling scenes with complex trajectories. We evaluate the performance of CT-NeRF on two real-world datasets, NeRFBuster and Free-Dataset, which feature complex trajectories. Results show CT-NeRF outperforms existing methods in novel view synthesis and pose estimation accuracy.
翻译:神经辐射场(NeRF)在高质量三维场景重建中取得了显著成果。然而,NeRF高度依赖精确的相机位姿。尽管BARF等近期工作已将相机位姿优化引入NeRF,但其应用仅限于简单轨迹场景。现有方法在应对包含大角度旋转的复杂轨迹时存在困难。为解决这一局限,我们提出CT-NeRF——一种仅使用无位姿与深度输入的RGB图像的增量式重建优化流水线。在该流水线中,我们首先在连接相邻帧的位姿图上提出局部-全局光束法平差,通过强制位姿间一致性来避免仅依赖场景结构一致导致的局部最优解。进一步,我们将位姿间一致性具象化为由输入图像对间像素级对应产生的重投影几何图像距离约束。通过增量式重建,CT-NeRF能够同时恢复相机位姿与场景结构,并有效处理复杂轨迹场景。我们在包含复杂轨迹的两个真实世界数据集NeRFBuster与Free-Dataset上评估CT-NeRF性能。结果表明,CT-NeRF在新视角合成与位姿估计精度上均优于现有方法。