The creation of high-fidelity, digital versions of human heads is an important stepping stone in the process of further integrating virtual components into our everyday lives. Constructing such avatars is a challenging research problem, due to a high demand for photo-realism and real-time rendering performance. In this work, we propose Neural Parametric Gaussian Avatars (NPGA), a data-driven approach to create high-fidelity, controllable avatars from multi-view video recordings. We build our method around 3D Gaussian splatting for its highly efficient rendering and to inherit the topological flexibility of point clouds. In contrast to previous work, we condition our avatars' dynamics on the rich expression space of neural parametric head models (NPHM), instead of mesh-based 3DMMs. To this end, we distill the backward deformation field of our underlying NPHM into forward deformations which are compatible with rasterization-based rendering. All remaining fine-scale, expression-dependent details are learned from the multi-view videos. For increased representational capacity of our avatars, we propose per-Gaussian latent features that condition each primitives dynamic behavior. To regularize this increased dynamic expressivity, we propose Laplacian terms on the latent features and predicted dynamics. We evaluate our method on the public NeRSemble dataset, demonstrating that NPGA significantly outperforms the previous state-of-the-art avatars on the self-reenactment task by 2.6 PSNR. Furthermore, we demonstrate accurate animation capabilities from real-world monocular videos.
翻译:创建高保真度的数字人头模型是进一步将虚拟组件融入日常生活中的重要基石。构建此类化身是一个具有挑战性的研究问题,因其对照片级真实感和实时渲染性能提出了极高要求。在本工作中,我们提出神经参数化高斯化身(NPGA),这是一种从多视角视频记录中创建高保真、可控化身的数驱动方法。我们基于3D高斯泼溅构建方法,以利用其高效渲染特性并继承点云的拓扑灵活性。与先前工作不同,我们将化身动态条件建立在神经参数化头部模型(NPHM)的丰富表情空间上,而非基于网格的3D形变模型。为此,我们将底层NPHM的后向形变场蒸馏为与基于光栅化的渲染兼容的前向形变。所有剩余的精细尺度、表情相关的细节均从多视角视频中学习。为提升化身的表征能力,我们提出基于每个高斯潜在特征的机制,以调节每个图元的动态行为。为规范这种增强的动态表达能力,我们提出对潜在特征和预测动态施加拉普拉斯约束项。我们在公开的NeRSemble数据集上评估了所提方法,结果表明NPGA在自我重演任务上以2.6 PSNR的显著优势超越先前最先进的化身方法。此外,我们展示了从真实世界单目视频中实现精确动画生成的能力。