Neural Radiance Fields (NeRF) have been proposed for photorealistic novel view rendering. However, it requires many different views of one scene for training. Moreover, it has poor generalizations to new scenes and requires retraining or fine-tuning on each scene. In this paper, we develop a new NeRF model for novel view synthesis using only a single image as input. We propose to combine the (coarse) planar rendering and the (fine) volume rendering to achieve higher rendering quality and better generalizations. We also design a depth teacher net that predicts dense pseudo depth maps to supervise the joint rendering mechanism and boost the learning of consistent 3D geometry. We evaluate our method on three challenging datasets. It outperforms state-of-the-art single-view NeRFs by achieving 5$\sim$20\% improvements in PSNR and reducing 20$\sim$50\% of the errors in the depth rendering. It also shows excellent generalization abilities to unseen data without the need to fine-tune on each new scene.
翻译:神经辐射场(NeRF)已被提出用于实现照片级真实感的新视角渲染。然而,它需要场景的多个不同视角进行训练,且对新场景的泛化能力较差,需要对每个场景进行重新训练或微调。本文提出一种仅使用单张图像作为输入的新视角合成NeRF模型。我们提出将(粗糙的)平面渲染与(精细的)体渲染相结合,以实现更高的渲染质量和更好的泛化能力。同时,我们设计了一个深度教师网络,该网络预测稠密伪深度图以监督联合渲染机制,促进一致三维几何结构的学习。我们在三个具有挑战性的数据集上评估了该方法。该方法在峰值信噪比(PSNR)上提升5\%~20\%,并将深度渲染误差降低20\%~50\%,优于当前最先进的单视图NeRF方法。此外,该方法展现出对未见数据的卓越泛化能力,无需对每个新场景进行微调。