High-quality reconstruction of controllable 3D head avatars from 2D videos is highly desirable for virtual human applications in movies, games, and telepresence. Neural implicit fields provide a powerful representation to model 3D head avatars with personalized shape, expressions, and facial parts, e.g., hair and mouth interior, that go beyond the linear 3D morphable model (3DMM). However, existing methods do not model faces with fine-scale facial features, or local control of facial parts that extrapolate asymmetric expressions from monocular videos. Further, most condition only on 3DMM parameters with poor(er) locality, and resolve local features with a global neural field. We build on part-based implicit shape models that decompose a global deformation field into local ones. Our novel formulation models multiple implicit deformation fields with local semantic rig-like control via 3DMM-based parameters, and representative facial landmarks. Further, we propose a local control loss and attention mask mechanism that promote sparsity of each learned deformation field. Our formulation renders sharper locally controllable nonlinear deformations than previous implicit monocular approaches, especially mouth interior, asymmetric expressions, and facial details.
翻译:从2D视频中高质量地重建可控的3D头部化身,对于电影、游戏和远程呈现等虚拟人应用具有极高的价值。神经隐式场提供了一种强大的表示方法,用于建模具有个性化形状、表情和面部部件(例如头发和口腔内部)的3D头部化身,这些超越了线性3D可变形模型(3DMM)。然而,现有方法并未以精细尺度面部特征建模人脸,也无法从单目视频中通过局部控制面部部件来外推非对称表情。此外,大多数方法仅以局部性较差的3DMM参数为条件,并通过全局神经场解析局部特征。我们基于部件化隐式形状模型,将全局变形场分解为局部变形场。我们的新公式通过基于3DMM的参数和代表性面部标志点,对多个具有局部语义绑定控制的隐式变形场进行建模。进一步,我们提出了局部控制损失和注意力掩码机制,以促进每个学习到的变形场的稀疏性。与先前的隐式单目方法相比,我们的公式能够渲染更锐利的局部可控非线性变形,尤其是在口腔内部、非对称表情和面部细节方面。