Novel view synthesis is a long-standing problem that revolves around rendering frames of scenes from novel camera viewpoints. Volumetric approaches provide a solution for modeling occlusions through the explicit 3D representation of the camera frustum. Multi-plane Images (MPI) are volumetric methods that represent the scene using front-parallel planes at distinct depths but suffer from depth discretization leading to a 2.D scene representation. Another line of approach relies on implicit 3D scene representations. Neural Radiance Fields (NeRF) utilize neural networks for encapsulating the continuous 3D scene structure within the network weights achieving photorealistic synthesis results, however, methods are constrained to per-scene optimization settings which are inefficient in practice. Multi-plane Neural Radiance Fields (MINE) open the door for combining implicit and explicit scene representations. It enables continuous 3D scene representations, especially in the depth dimension, while utilizing the input image features to avoid per-scene optimization. The main drawback of the current literature work in this domain is being constrained to single-view input, limiting the synthesis ability to narrow viewpoint ranges. In this work, we thoroughly examine the performance, generalization, and efficiency of single-view multi-plane neural radiance fields. In addition, we propose a new multiplane NeRF architecture that accepts multiple views to improve the synthesis results and expand the viewing range. Features from the input source frames are effectively fused through a proposed attention-aware fusion module to highlight important information from different viewpoints. Experiments show the effectiveness of attention-based fusion and the promising outcomes of our proposed method when compared to multi-view NeRF and MPI techniques.
翻译:新视角合成是一个长期存在的问题,核心在于从新颖的相机视角渲染场景帧。体积方法通过显式三维表示相机视锥体,为遮挡建模提供了解决方案。多平面图像(MPI)是一类体积方法,利用不同深度处的前平行平面表示场景,但由于深度离散化,其场景表示局限于2.5维。另一类方法依赖于隐式三维场景表示。神经辐射场(NeRF)利用神经网络将连续的三维场景结构封装在网络权重中,实现了逼真的合成效果,然而该方法受限于逐场景优化设置,在实践中效率较低。多平面神经辐射场(MINE)为结合隐式与显式场景表示开辟了道路,它能够实现连续的三维场景表示(尤其在深度维度上),同时利用输入图像特征避免逐场景优化。当前文献工作在该领域的主要局限性在于仅限单视角输入,导致合成能力局限于狭窄的视角范围。在本工作中,我们深入研究了单视角多平面神经辐射场的性能、泛化能力和效率。此外,我们提出了一种新型多平面NeRF架构,该架构接受多视角输入以改善合成效果并扩展视角范围。通过提出的注意力感知融合模块,输入源帧的特征被有效融合,以突出不同视角的重要信息。实验结果表明,基于注意力的融合具有显著效果,且与多视角NeRF和MPI技术相比,我们提出的方法取得了令人满意的成果。