Recently, the reconstruction of high-fidelity 3D head models from static portrait image has made great progress. However, most methods require multi-view or multi-illumination information, which therefore put forward high requirements for data acquisition. In this paper, we study the reconstruction of high-fidelity 3D head models from arbitrary monocular videos. Non-rigid structure from motion (NRSFM) methods have been widely used to solve such problems according to the two-dimensional correspondence between different frames. However, the inaccurate correspondence caused by high-complex hair structures and various facial expression changes would heavily influence the reconstruction accuracy. To tackle these problems, we propose a prior-guided dynamic implicit neural network. Specifically, we design a two-part dynamic deformation field to transform the current frame space to the canonical one. We further model the head geometry in the canonical space with a learnable signed distance field (SDF) and optimize it using the volumetric rendering with the guidance of two-main head priors to improve the reconstruction accuracy and robustness. Extensive ablation studies and comparisons with state-of-the-art methods demonstrate the effectiveness and robustness of our proposed method.
翻译:近期,从静态肖像图像重建高保真三维头部模型取得了重大进展。然而,多数方法需要多视角或多光照信息,这对数据采集提出了较高要求。本文研究从任意单目视频重建高保真三维头部模型的方法。非刚性运动恢复结构(NRSFM)方法常通过不同帧间的二维对应关系解决此类问题,但高复杂度毛发结构和多样化面部表情变化引发的对应关系不准确性会严重影响重建精度。为解决上述问题,我们提出了一种先验引导的动态隐式神经网络。具体而言,我们设计了两部分动态形变场将当前帧空间转换至规范空间,并在规范空间中利用可学习的符号距离场(SDF)建模头部几何结构,通过体积渲染结合两大头部先验引导进行优化,从而提升重建精度与鲁棒性。大量消融实验及与现有最优方法的对比表明,该方法具有显著的有效性与鲁棒性。