Neural radiance fields (NeRF) shows powerful performance in novel view synthesis and 3D geometry reconstruction, but it suffers from critical performance degradation when the number of known viewpoints is drastically reduced. Existing works attempt to overcome this problem by employing external priors, but their success is limited to certain types of scenes or datasets. Employing monocular depth estimation (MDE) networks, pretrained on large-scale RGB-D datasets, with powerful generalization capability would be a key to solving this problem: however, using MDE in conjunction with NeRF comes with a new set of challenges due to various ambiguity problems exhibited by monocular depths. In this light, we propose a novel framework, dubbed D\"aRF, that achieves robust NeRF reconstruction with a handful of real-world images by combining the strengths of NeRF and monocular depth estimation through online complementary training. Our framework imposes the MDE network's powerful geometry prior to NeRF representation at both seen and unseen viewpoints to enhance its robustness and coherence. In addition, we overcome the ambiguity problems of monocular depths through patch-wise scale-shift fitting and geometry distillation, which adapts the MDE network to produce depths aligned accurately with NeRF geometry. Experiments show our framework achieves state-of-the-art results both quantitatively and qualitatively, demonstrating consistent and reliable performance in both indoor and outdoor real-world datasets. Project page is available at https://ku-cvlab.github.io/DaRF/.
翻译:神经辐射场(NeRF)在新视角合成与三维几何重建中展现出强大性能,但当已知视角数量大幅减少时,其性能会出现关键性退化。现有方法尝试通过引入外部先验来克服这一问题,但其成功仅限于特定场景类型或数据集。利用在大规模RGB-D数据集上预训练、具备强大泛化能力的单目深度估计(MDE)网络,将是解决该问题的关键:然而,由于单目深度呈现出的多种歧义性问题,将MDE与NeRF结合使用会带来一系列新挑战。基于此,我们提出名为DäRF的新型框架,通过在线互补训练融合NeRF与单目深度估计的优势,实现仅利用少量真实世界图像的鲁棒NeRF重建。该框架在已见与未见视角下均强行施加MDE网络的几何先验于NeRF表示,以增强其鲁棒性与一致性。此外,我们通过分块尺度-偏移拟合与几何蒸馏克服单目深度的歧义性问题,使MDE网络自适应生成与NeRF几何精确对齐的深度。实验表明,本框架在定量与定性评估中均达到最先进水平,在室内外真实数据集上展现出稳定可靠性能。项目页面见https://ku-cvlab.github.io/DaRF/。