Neural Radiance Fields (NeRF) have shown promise in generating realistic novel views from sparse scene images. However, existing NeRF approaches often encounter challenges due to the lack of explicit 3D supervision and imprecise camera poses, resulting in suboptimal outcomes. To tackle these issues, we propose AltNeRF -- a novel framework designed to create resilient NeRF representations using self-supervised monocular depth estimation (SMDE) from monocular videos, without relying on known camera poses. SMDE in AltNeRF masterfully learns depth and pose priors to regulate NeRF training. The depth prior enriches NeRF's capacity for precise scene geometry depiction, while the pose prior provides a robust starting point for subsequent pose refinement. Moreover, we introduce an alternating algorithm that harmoniously melds NeRF outputs into SMDE through a consistence-driven mechanism, thus enhancing the integrity of depth priors. This alternation empowers AltNeRF to progressively refine NeRF representations, yielding the synthesis of realistic novel views. Extensive experiments showcase the compelling capabilities of AltNeRF in generating high-fidelity and robust novel views that closely resemble reality.
翻译:神经辐射场(NeRF)在从稀疏场景图像生成逼真新视角方面展现出潜力。然而,现有 NeRF 方法常因缺乏显式三维监督及不准确的相机位姿而面临挑战,导致结果欠佳。为解决这些问题,我们提出 AltNeRF——一种新颖框架,旨在利用单目视频中的自监督单目深度估计(SMDE)构建稳健的 NeRF 表征,且无需依赖已知相机位姿。AltNeRF 中的 SMDE 能够熟练学习深度与位姿先验,以调节 NeRF 训练过程。深度先验增强了 NeRF 精确描绘场景几何的能力,而位姿先验则为后续位姿优化提供了稳健的初始点。此外,我们引入了一种交替算法,通过一致性驱动机制将 NeRF 输出和谐地融入 SMDE,从而提升深度先验的完整性。这种交替使 AltNeRF 能够逐步优化 NeRF 表征,实现逼真新视角的合成。大量实验展示了 AltNeRF 在生成高保真度且鲁棒性强的、高度接近真实场景的新视角方面的卓越能力。