Neural radiance fields (NeRFs) are able to synthesize realistic novel views from multi-view images captured from distinct positions and perspectives. In NeRF's rendering pipeline, neural networks are used to represent a scene independently or transform queried learnable feature vector of a point to the expected color or density. With the aid of geometry guides either in occupancy grids or proposal networks, the number of neural network evaluations can be reduced from hundreds to dozens in the standard volume rendering framework. Instead of rendering yielded color after neural network evaluation, we propose to render the queried feature vectors of a ray first and then transform the rendered feature vector to the final pixel color by a neural network. This fundamental change to the standard volume rendering framework requires only one single neural network evaluation to render a pixel, which substantially lowers the high computational complexity of the rendering framework attributed to a large number of neural network evaluations. Consequently, we can use a comparably larger neural network to achieve a better rendering quality while maintaining the same training and rendering time costs. Our model achieves the state-of-the-art rendering quality on both synthetic and real-world datasets while requiring a training time of several minutes.
翻译:神经辐射场(NeRFs)能够从不同位置和视角拍摄的多视角图像中合成逼真的新视图。在NeRF的渲染流程中,神经网络被用于独立表示场景,或将查询到的点的可学习特征向量转换为预期的颜色或密度。借助几何引导(无论是占据网格还是提议网络),标准体积渲染框架中神经网络评估次数可从数百次减少至数十次。我们提出先渲染射线查询到的特征向量,再通过神经网络将渲染得到的特征向量转换为最终像素颜色,而非在神经网络评估后直接渲染生成的颜色。这一对标准体积渲染框架的根本性改变使得渲染单个像素仅需一次神经网络评估,从而大幅降低了因大量神经网络评估导致的渲染框架高计算复杂度。因此,我们能够使用相对更大的神经网络在保持相同训练和渲染时间成本的同时获得更优的渲染质量。我们的模型在合成数据集和真实世界数据集上均达到了最先进的渲染质量,同时训练时间仅需数分钟。