Neural Radiance Fields (NeRF) have been proposed for photorealistic novel view rendering. However, it requires many different views of one scene for training. Moreover, it has poor generalizations to new scenes and requires retraining or fine-tuning on each scene. In this paper, we develop a new NeRF model for novel view synthesis using only a single image as input. We propose to combine the (coarse) planar rendering and the (fine) volume rendering to achieve higher rendering quality and better generalizations. We also design a depth teacher net that predicts dense pseudo depth maps to supervise the joint rendering mechanism and boost the learning of consistent 3D geometry. We evaluate our method on three challenging datasets. It outperforms state-of-the-art single-view NeRFs by achieving 5$\sim$20\% improvements in PSNR and reducing 20$\sim$50\% of the errors in the depth rendering. It also shows excellent generalization abilities to unseen data without the need to fine-tune on each new scene.
翻译:神经辐射场(NeRF)已被提出用于实现照片级真实感的新视角渲染,但其需要场景的多个不同视角进行训练,并且泛化能力差,需在每个新场景上重新训练或微调。本文开发了一种仅以单张图像为输入的新视角合成NeRF模型。我们提出结合(粗)平面渲染与(细)体渲染以实现更高的渲染质量和更好的泛化性能,同时设计了一个深度教师网络来预测密集伪深度图以监督联合渲染机制,进而促进一致三维几何的学习。在三个具有挑战性的数据集上的评估表明,该方法以PSNR提升5~20%、深度渲染误差减少20~50%的性能超越了当前最先进的单视角NeRF,并且无需针对每个新场景微调即展现出优异的未知数据泛化能力。