Neural Radiance Fields (NeRF) have garnered considerable attention as a paradigm for novel view synthesis by learning scene representations from discrete observations. Nevertheless, NeRF exhibit pronounced performance degradation when confronted with sparse view inputs, consequently curtailing its further applicability. In this work, we introduce Hierarchical Geometric, Semantic, and Photometric Guided NeRF (HG3-NeRF), a novel methodology that can address the aforementioned limitation and enhance consistency of geometry, semantic content, and appearance across different views. We propose Hierarchical Geometric Guidance (HGG) to incorporate the attachment of Structure from Motion (SfM), namely sparse depth prior, into the scene representations. Different from direct depth supervision, HGG samples volume points from local-to-global geometric regions, mitigating the misalignment caused by inherent bias in the depth prior. Furthermore, we draw inspiration from notable variations in semantic consistency observed across images of different resolutions and propose Hierarchical Semantic Guidance (HSG) to learn the coarse-to-fine semantic content, which corresponds to the coarse-to-fine scene representations. Experimental results demonstrate that HG3-NeRF can outperform other state-of-the-art methods on different standard benchmarks and achieve high-fidelity synthesis results for sparse view inputs.
翻译:神经辐射场(NeRF)作为一种通过离散观测学习场景表示的新视角合成范式,已受到广泛关注。然而,当面对稀疏视角输入时,NeRF会出现显著的性能退化,从而限制了其进一步应用。本文提出了一种新颖的分层几何、语义与光度引导NeRF(HG3-NeRF)方法,能够解决上述局限并增强不同视角下几何、语义内容与外观的一致性。我们提出了分层几何引导(HGG),将运动恢复结构(SfM)的附属信息(即稀疏深度先验)融入场景表示。与直接深度监督不同,HGG从局部到全局几何区域采样体积点,缓解了深度先验固有偏差导致的错位问题。此外,我们受不同分辨率图像中观测到的语义一致性显著变化的启发,提出了分层语义引导(HSG),用于学习与由粗到细场景表示相对应的由粗到细语义内容。实验结果表明,HG3-NeRF在不同标准基准上能够超越其他最先进方法,并为稀疏视角输入实现高保真合成结果。