We focus on reconstructing high-fidelity radiance fields of human heads, capturing their animations over time, and synthesizing re-renderings from novel viewpoints at arbitrary time steps. To this end, we propose a new multi-view capture setup composed of 16 calibrated machine vision cameras that record time-synchronized images at 7.1 MP resolution and 73 frames per second. With our setup, we collect a new dataset of over 4700 high-resolution, high-framerate sequences of more than 220 human heads, from which we introduce a new human head reconstruction benchmark. The recorded sequences cover a wide range of facial dynamics, including head motions, natural expressions, emotions, and spoken language. In order to reconstruct high-fidelity human heads, we propose Dynamic Neural Radiance Fields using Hash Ensembles (NeRSemble). We represent scene dynamics by combining a deformation field and an ensemble of 3D multi-resolution hash encodings. The deformation field allows for precise modeling of simple scene movements, while the ensemble of hash encodings helps to represent complex dynamics. As a result, we obtain radiance field representations of human heads that capture motion over time and facilitate re-rendering of arbitrary novel viewpoints. In a series of experiments, we explore the design choices of our method and demonstrate that our approach outperforms state-of-the-art dynamic radiance field approaches by a significant margin.
翻译:我们专注于重建高保真人头辐射场,捕捉其随时间变化的动态,并在任意时间步从新视角合成重渲染效果。为此,我们提出一种由16台标定机器视觉相机组成的新型多视图采集装置,可同步记录分辨率为7.1 MP、帧率为73帧/秒的图像。基于该装置,我们收集了涵盖超过220个人头的4700余个高分辨率、高帧率序列数据集,并据此引入全新的人头重建基准。记录序列覆盖广泛的面部动态,包括头部运动、自然表情、情绪变化及语言表达。为实现高保真人头重建,我们提出基于哈希集成(Hash Ensembles)的动态神经辐射场(NeRSemble)。该模型通过结合形变场与三维多分辨率哈希编码集成来表示场景动态:形变场可精确建模简单场景运动,而哈希编码集成则有助于表征复杂动态。由此,我们获得能捕捉时变运动的人头辐射场表示,并支持任意新视角的重渲染。通过系列实验,我们探讨了方法的设计选择,并证明本方法显著优于现有最优动态辐射场方法。