Neural scene representations, both continuous and discrete, have recently emerged as a powerful new paradigm for 3D scene understanding. Recent efforts have tackled unsupervised discovery of object-centric neural scene representations. However, the high cost of ray-marching, exacerbated by the fact that each object representation has to be ray-marched separately, leads to insufficiently sampled radiance fields and thus, noisy renderings, poor framerates, and high memory and time complexity during training and rendering. Here, we propose to represent objects in an object-centric, compositional scene representation as light fields. We propose a novel light field compositor module that enables reconstructing the global light field from a set of object-centric light fields. Dubbed Compositional Object Light Fields (COLF), our method enables unsupervised learning of object-centric neural scene representations, state-of-the-art reconstruction and novel view synthesis performance on standard datasets, and rendering and training speeds at orders of magnitude faster than existing 3D approaches.
翻译:神经场景表示(包括连续和离散表示)近期已成为三维场景理解领域一种强有力的新范式。现有工作已解决了物体中心神经场景表示的无监督发现难题,但光线行进(ray-marching)的高计算成本(因每个物体表示需单独进行光线行进而进一步加剧)会导致辐射场采样不足,从而产生噪声渲染、帧率低下,并在训练与渲染过程中带来高内存与时间复杂度。针对此问题,本文提出将物体中心组合场景表示中的物体表示为光场。我们设计了一种新型光场组合器模块,能够从一组物体中心光场中重建全局光场。该方法被命名为组合物体光场(COLF),它实现了物体中心神经场景表示的无监督学习,在标准数据集上取得了最先进的重建与新视角合成性能,且渲染与训练速度比现有三维方法快数个数量级。