Temporal interpolation often plays a crucial role to learn meaningful representations in dynamic scenes. In this paper, we propose a novel method to train spatiotemporal neural radiance fields of dynamic scenes based on temporal interpolation of feature vectors. Two feature interpolation methods are suggested depending on underlying representations, neural networks or grids. In the neural representation, we extract features from space-time inputs via multiple neural network modules and interpolate them based on time frames. The proposed multi-level feature interpolation network effectively captures features of both short-term and long-term time ranges. In the grid representation, space-time features are learned via four-dimensional hash grids, which remarkably reduces training time. The grid representation shows more than 100 times faster training speed than the previous neural-net-based methods while maintaining the rendering quality. Concatenating static and dynamic features and adding a simple smoothness term further improve the performance of our proposed models. Despite the simplicity of the model architectures, our method achieved state-of-the-art performance both in rendering quality for the neural representation and in training speed for the grid representation.
翻译:时序插值在动态场景中学习有意义的表示时通常起着关键作用。本文提出了一种基于特征向量时序插值来训练动态场景时空神经辐射场的新方法。根据底层表示(神经网络或网格)的不同,我们提出了两种特征插值方法。在神经表示中,我们通过多个神经网络模块从时空输入中提取特征,并基于时间帧对其进行插值。所提出的多级特征插值网络能够有效捕获短期和长期时间范围的特征。在网格表示中,通过四维哈希网格学习时空特征,从而显著减少了训练时间。与以往基于神经网络的方法相比,网格表示的训练速度提升了超过100倍,同时保持了渲染质量。将静态与动态特征进行拼接并添加简单的平滑项,进一步提升了所提模型的性能。尽管模型架构简洁,我们的方法在神经表示的渲染质量和网格表示的训练速度上均达到了最先进性能。