Neural radiance field (NeRF) has achieved great success in novel view synthesis and 3D representation for static scenarios. Existing dynamic NeRFs usually exploit a locally dense grid to fit the deformation field; however, they fail to capture the global dynamics and concomitantly yield models of heavy parameters. We observe that the 4D space is inherently sparse. Firstly, the deformation field is sparse in spatial but dense in temporal due to the continuity of of motion. Secondly, the radiance field is only valid on the surface of the underlying scene, usually occupying a small fraction of the whole space. We thus propose to represent the 4D scene using a learnable sparse latent space, a.k.a. SLS4D. Specifically, SLS4D first uses dense learnable time slot features to depict the temporal space, from which the deformation field is fitted with linear multi-layer perceptions (MLP) to predict the displacement of a 3D position at any time. It then learns the spatial features of a 3D position using another sparse latent space. This is achieved by learning the adaptive weights of each latent code with the attention mechanism. Extensive experiments demonstrate the effectiveness of our SLS4D: it achieves the best 4D novel view synthesis using only about $6\%$ parameters of the most recent work.
翻译:神经辐射场(NeRF)在静态场景的新视角合成和3D表示方面取得了巨大成功。现有动态NeRF通常利用局部密集网格来拟合形变场,然而它们难以捕捉全局动态性,并同时产生参数庞大的模型。我们观察到4D空间本质上是稀疏的。首先,由于运动的连续性,形变场在空间上稀疏但在时间上密集。其次,辐射场仅在底层场景的表面有效,通常只占整个空间的一小部分。因此,我们提出使用可学习的稀疏潜在空间(即SLS4D)来表示4D场景。具体而言,SLS4D首先使用密集的可学习时隙特征来描述时间空间,通过线性多层感知器(MLP)拟合形变场,以预测任意时刻三维位置的位置偏移。然后,利用另一个稀疏潜在空间学习三维位置的空间特征,这通过注意力机制学习每个潜在编码的自适应权重来实现。大量实验证明了我们SLS4D的有效性:仅使用最新工作约$6\%$的参数,它实现了最佳的4D新视角合成。