Monocular depth estimation has applications in many fields, such as autonomous navigation and extended reality, making it an essential computer vision task. However, current methods often produce smooth depth maps that lack the fine geometric detail needed for accurate scene understanding. We propose MDENeRF, an iterative framework that refines monocular depth estimates using depth information from Neural Radiance Fields (NeRFs). MDENeRF consists of three components: (1) an initial monocular estimate for global structure, (2) a NeRF trained on perturbed viewpoints, with per-pixel uncertainty, and (3) Bayesian fusion of the noisy monocular and NeRF depths. We derive NeRF uncertainty from the volume rendering process to iteratively inject high-frequency fine details. Meanwhile, our monocular prior maintains global structure. We demonstrate superior performance on key metrics and experiments using indoor scenes from the SUN RGB-D dataset.
翻译:单目深度估计在自动驾驶导航和扩展现实等诸多领域具有重要应用,使其成为计算机视觉中的关键任务。然而,现有方法通常生成平滑的深度图,缺乏精确场景理解所需的精细几何细节。本文提出MDENeRF,一种利用神经辐射场(NeRF)深度信息迭代优化单目深度估计的框架。MDENeRF包含三个核心组件:(1)用于全局结构的初始单目估计;(2)基于扰动视角训练并具有逐像素不确定性的NeRF模型;(3)对含噪声的单目深度与NeRF深度进行贝叶斯融合。我们从体渲染过程中推导NeRF的不确定性度量,通过迭代方式注入高频细节信息。同时,我们的单目先验保持了全局结构的一致性。在SUN RGB-D数据集的室内场景实验中,本方法在关键指标上均表现出优越性能。