The reliance on accurate camera poses is a significant barrier to the widespread deployment of Neural Radiance Fields (NeRF) models for 3D reconstruction and SLAM tasks. The existing method introduces monocular depth priors to jointly optimize the camera poses and NeRF, which fails to fully exploit the depth priors and neglects the impact of their inherent noise. In this paper, we propose Truncated Depth NeRF (TD-NeRF), a novel approach that enables training NeRF from unknown camera poses - by jointly optimizing learnable parameters of the radiance field and camera poses. Our approach explicitly utilizes monocular depth priors through three key advancements: 1) we propose a novel depth-based ray sampling strategy based on the truncated normal distribution, which improves the convergence speed and accuracy of pose estimation; 2) to circumvent local minima and refine depth geometry, we introduce a coarse-to-fine training strategy that progressively improves the depth precision; 3) we propose a more robust inter-frame point constraint that enhances robustness against depth noise during training. The experimental results on three datasets demonstrate that TD-NeRF achieves superior performance in the joint optimization of camera pose and NeRF, surpassing prior works, and generates more accurate depth geometry. The implementation of our method has been released at https://github.com/nubot-nudt/TD-NeRF.
翻译:依赖精确相机姿态是神经辐射场(NeRF)模型在三维重建与SLAM任务中广泛部署的主要障碍。现有方法引入单目深度先验以联合优化相机姿态与NeRF,但未能充分挖掘深度先验信息,且忽略了其固有噪声的影响。本文提出截断深度NeRF(TD-NeRF)——一种通过联合优化辐射场可学习参数与相机姿态实现未知相机姿态下NeRF训练的新方法。我们的方法通过三项关键进展显式利用单目深度先验:1)提出基于截断正态分布的新型深度驱动射线采样策略,提升了相机姿态估计的收敛速度与精度;2)为规避局部最小值并细化深度几何结构,引入渐进式粗到细训练策略以逐步提升深度精度;3)提出更具鲁棒性的帧间点约束,增强训练过程中对深度噪声的鲁棒性。三个数据集上的实验结果表明,TD-NeRF在相机姿态与NeRF联合优化中取得了超越先前工作的优越性能,并生成了更精确的深度几何结构。本方法已在https://github.com/nubot-nudt/TD-NeRF 开源实现。