Neural radiance fields (NeRF) have achieved impressive performances in view synthesis by encoding neural representations of a scene. However, NeRFs require hundreds of images per scene to synthesize photo-realistic novel views. Training them on sparse input views leads to overfitting and incorrect scene depth estimation resulting in artifacts in the rendered novel views. Sparse input NeRFs were recently regularized by providing dense depth estimated from pre-trained networks as supervision, to achieve improved performance over sparse depth constraints. However, we find that such depth priors may be inaccurate due to generalization issues. Instead, we hypothesize that the visibility of pixels in different input views can be more reliably estimated to provide dense supervision. In this regard, we compute a visibility prior through the use of plane sweep volumes, which does not require any pre-training. By regularizing the NeRF training with the visibility prior, we successfully train the NeRF with few input views. We reformulate the NeRF to also directly output the visibility of a 3D point from a given viewpoint to reduce the training time with the visibility constraint. On multiple datasets, our model outperforms the competing sparse input NeRF models including those that use learned priors. The source code for our model can be found on our project page: https://nagabhushansn95.github.io/publications/2023/ViP-NeRF.html.
翻译:神经辐射场(NeRF)通过编码场景的神经表示,在视图合成中取得了令人瞩目的性能。然而,NeRF 需要每场景数百张图像才能合成逼真的新视角。在稀疏输入视图上训练 NeRF 会导致过拟合和错误的场景深度估计,从而在渲染的新视角中产生伪影。近期,通过提供从预训练网络估计的密集深度作为监督信号来正则化稀疏输入 NeRF,其性能优于稀疏深度约束方法。但我们发现,由于泛化问题,此类深度先验可能不准确。为此,我们假设不同输入视图中像素的可见性可以更可靠地估计,从而提供密集监督。基于此,我们通过平面扫描体计算可见性先验,该方法无需任何预训练。通过将可见性先验用于正则化 NeRF 训练,我们成功实现了在少量输入视图下训练 NeRF。我们重新设计了 NeRF 架构,使其能够同时直接输出从给定视点观察到的三维点可见性,从而减少可见性约束下的训练时间。在多个数据集上,我们的模型优于包括使用学习先验方法在内的竞争性稀疏输入 NeRF 模型。本模型的源代码可参见项目页面:https://nagabhushansn95.github.io/publications/2023/ViP-NeRF.html。