A recent trend among generalizable novel view synthesis methods is to learn a rendering operator acting over single camera rays. This approach is promising because it removes the need for explicit volumetric rendering, but it effectively treats target images as collections of independent pixels. Here, we propose to learn a global rendering operator acting over all camera rays jointly. We show that the right representation to enable such rendering is the 5-dimensional plane sweep volume, consisting of the projection of the input images on a set of planes facing the target camera. Based on this understanding, we introduce our Convolutional Global Latent Renderer (ConvGLR), an efficient convolutional architecture that performs the rendering operation globally in a low-resolution latent space. Experiments on various datasets under sparse and generalizable setups show that our approach consistently outperforms existing methods by significant margins.
翻译:近年来,在可泛化的新视角合成方法中,学习作用于单条光线上的渲染算子成为趋势。该方法因无需显式体积渲染而具有前景,但本质上将目标图像视为独立像素的集合。本文提出学习一种全局渲染算子,该算子联合作用于所有相机光线。我们证明实现此类渲染的正确表征是五维平面扫描体,即输入图像在面向目标相机的平面前投影的结果。基于这一认识,我们提出卷积全局潜在渲染器(ConvGLR),一种在低分辨率潜在空间中全局执行渲染运算的高效卷积架构。在稀疏设置与可泛化设置下的多数据集实验表明,我们的方法持续以显著优势优于现有方法。