A recent trend among generalizable novel view synthesis methods is to learn a rendering operator acting over single camera rays. This approach is promising because it removes the need for explicit volumetric rendering, but it effectively treats target images as collections of independent pixels. Here, we propose to learn a global rendering operator acting over all camera rays jointly. We show that the right representation to enable such rendering is a 5-dimensional plane sweep volume consisting of the projection of the input images on a set of planes facing the target camera. Based on this understanding, we introduce our Convolutional Global Latent Renderer (ConvGLR), an efficient convolutional architecture that performs the rendering operation globally in a low-resolution latent space. Experiments on various datasets under sparse and generalizable setups show that our approach consistently outperforms existing methods by significant margins.
翻译:近年来,可泛化新视角合成方法的一个趋势是学习对单条相机光线施加操作的渲染算子。该方法虽因无需显式体渲染而具有前景,但本质上将目标图像视为独立像素的集合。本文提出学习一个全局渲染算子,对所有相机光线联合处理。研究表明,实现此类渲染所需的正确表征是一种五维平面扫描体积——该体积由输入图像在面向目标相机的一组平面上的投影构成。基于这一认识,我们提出了卷积全局隐式渲染器(ConvGLR),一种在低分辨率隐空间中全局执行渲染操作的高效卷积架构。在稀疏与可泛化设置下的多个数据集实验表明,我们的方法以显著优势稳定超越现有方法。