Neural Radiance Fields (NeRF) show impressive performance in photo-realistic free-view rendering of scenes. Recent improvements on the NeRF such as TensoRF and ZipNeRF employ explicit models for faster optimization and rendering, as compared to the NeRF that employs an implicit representation. However, both implicit and explicit radiance fields require dense sampling of images in the given scene. Their performance degrades significantly when only a sparse set of views is available. Researchers find that supervising the depth estimated by a radiance field helps train it effectively with fewer views. The depth supervision is obtained either using classical approaches or neural networks pre-trained on a large dataset. While the former may provide only sparse supervision, the latter may suffer from generalization issues. As opposed to the earlier approaches, we seek to learn the depth supervision by designing augmented models and training them along with the main radiance field. Further, we aim to design a framework of regularizations that can work across different implicit and explicit radiance fields. We observe that certain features of these radiance field models overfit to the observed images in the sparse-input scenario. Our key finding is that reducing the capability of the radiance fields with respect to positional encoding, the number of decomposed tensor components or the size of the hash table, constrains the model to learn simpler solutions, which estimate better depth in certain regions. By designing augmented models based on such reduced capabilities, we obtain better depth supervision for the main radiance field. We achieve state-of-the-art view-synthesis performance with sparse input views on popular datasets containing forward-facing and 360$^\circ$ scenes by employing the above regularizations.
翻译:神经辐射场(NeRF)在场景的照片级真实感自由视角渲染中展现出令人印象深刻的性能。与采用隐式表示的NeRF相比,诸如TensoRF和ZipNeRF等NeRF的近期改进采用了显式模型以实现更快的优化和渲染。然而,无论是隐式还是显式辐射场,都需要对给定场景进行密集的图像采样。当仅能获取稀疏的视角集合时,它们的性能会显著下降。研究人员发现,通过监督辐射场估计的深度,有助于在较少视角下有效地训练它。深度监督可以通过经典方法或在大规模数据集上预训练的神经网络获得。前者可能仅提供稀疏的监督,而后者则可能面临泛化问题。与早期方法不同,我们试图通过设计增强模型并将其与主辐射场一同训练来学习深度监督。此外,我们的目标是设计一个能够跨不同隐式和显式辐射场工作的正则化框架。我们观察到,在稀疏输入场景下,这些辐射场模型的某些特征会过拟合于观察到的图像。我们的关键发现是,通过减少辐射场在位置编码、分解张量分量数量或哈希表大小方面的能力,可以约束模型学习更简化的解,这些解在某些区域能估计出更好的深度。基于这种能力缩减设计增强模型,我们为主辐射场获得了更好的深度监督。通过采用上述正则化方法,我们在包含前向场景和360$^\circ$场景的流行数据集上,实现了稀疏输入视角下最先进的视图合成性能。