Visually exploring in a real-world 4D spatiotemporal space freely in VR has been a long-term quest. The task is especially appealing when only a few or even single RGB cameras are used for capturing the dynamic scene. To this end, we present an efficient framework capable of fast reconstruction, compact modeling, and streamable rendering. First, we propose to decompose the 4D spatiotemporal space according to temporal characteristics. Points in the 4D space are associated with probabilities of belonging to three categories: static, deforming, and new areas. Each area is represented and regularized by a separate neural field. Second, we propose a hybrid representations based feature streaming scheme for efficiently modeling the neural fields. Our approach, coined NeRFPlayer, is evaluated on dynamic scenes captured by single hand-held cameras and multi-camera arrays, achieving comparable or superior rendering performance in terms of quality and speed comparable to recent state-of-the-art methods, achieving reconstruction in 10 seconds per frame and interactive rendering.
翻译:在VR中自由探索真实世界4D时空空间一直是长期追求的目标,当仅使用少量甚至单台RGB相机捕捉动态场景时,这一任务尤为引人关注。为此,我们提出了一种高效的框架,能够实现快速重建、紧凑建模和可流式渲染。首先,我们提出根据时间特性对4D时空空间进行分解。4D空间中的点与属于三类区域(静态区域、变形区域和新区域)的概率相关联,每个区域均由单独的神经场进行表示和正则化。其次,我们提出了一种基于混合表示的特征流式方案,用于高效地对神经场进行建模。我们的方法名为NeRFPlayer,在单手持相机和多相机阵列捕捉的动态场景上进行了评估,在质量和速度方面达到了与近期最先进方法相当或更优的渲染性能,实现了每帧10秒的重建和交互式渲染。