The latest regularized Neural Radiance Field (NeRF) approaches produce poor geometry and view extrapolation for large scale sparse view scenes, such as ETH3D. Density-based approaches tend to be under-constrained, while surface-based approaches tend to miss details. In this paper, we take a density-based approach, sampling patches instead of individual rays to better incorporate monocular depth and normal estimates and patch-based photometric consistency constraints between training views and sampled virtual views. Loosely constraining densities based on estimated depth aligned to sparse points further improves geometric accuracy. While maintaining similar view synthesis quality, our approach significantly improves geometric accuracy on the ETH3D benchmark, e.g. increasing the F1@2cm score by 4x-8x compared to other regularized density-based approaches, with much lower training and inference time than other approaches.
翻译:最新的正则化神经辐射场(NeRF)方法在大规模稀疏视角场景(如ETH3D)中,其几何重建与视角外推效果较差。基于密度的方法往往约束不足,而基于表面的方法则容易丢失细节。本文采用基于密度的方法,通过采样图像补丁而非单条光线,以更好地融合单目深度与法向估计,并在训练视角与采样的虚拟视角之间施加基于补丁的光度一致性约束。基于稀疏点对齐的估计深度对密度进行宽松约束,进一步提升了几何精度。在保持相近视角合成质量的同时,我们的方法在ETH3D基准测试中显著提高了几何精度——例如,相较于其他正则化密度方法,F1@2cm分数提升了4至8倍,且训练与推理时间大幅低于其他方法。