Recent works in implicit representations, such as Neural Radiance Fields (NeRF), have advanced the generation of realistic and animatable head avatars from video sequences. These implicit methods are still confronted by visual artifacts and jitters, since the lack of explicit geometric constraints poses a fundamental challenge in accurately modeling complex facial deformations. In this paper, we introduce Dynamic Tetrahedra (DynTet), a novel hybrid representation that encodes explicit dynamic meshes by neural networks to ensure geometric consistency across various motions and viewpoints. DynTet is parameterized by the coordinate-based networks which learn signed distance, deformation, and material texture, anchoring the training data into a predefined tetrahedra grid. Leveraging Marching Tetrahedra, DynTet efficiently decodes textured meshes with a consistent topology, enabling fast rendering through a differentiable rasterizer and supervision via a pixel loss. To enhance training efficiency, we incorporate classical 3D Morphable Models to facilitate geometry learning and define a canonical space for simplifying texture learning. These advantages are readily achievable owing to the effective geometric representation employed in DynTet. Compared with prior works, DynTet demonstrates significant improvements in fidelity, lip synchronization, and real-time performance according to various metrics. Beyond producing stable and visually appealing synthesis videos, our method also outputs the dynamic meshes which is promising to enable many emerging applications.
翻译:近期以神经辐射场(NeRF)为代表的隐式表示方法,推动了从视频序列生成逼真且可动画化头部化身的技术发展。然而,由于缺乏显式几何约束,难以精确建模复杂面部形变,这些隐式方法仍面临视觉伪影和抖动问题。本文提出动态四面体(DynTet)这一新型混合表示方法,通过神经网络编码显式动态网格,确保不同姿态和视角下的几何一致性。DynTet以坐标网络为参数化基础,学习符号距离、形变与材质纹理,并将训练数据锚定于预定义的四面体网格。借助Marching Tetrahedra算法,DynTet能以一致拓扑高效解码带纹理网格,通过可微分光栅化实现快速渲染,并利用像素损失进行监督。为提升训练效率,我们引入经典的三维可变形模型辅助几何学习,并定义规范空间以简化纹理学习。这些优势得益于DynTet所采用的高效几何表示。与现有方法相比,DynTet在保真度、唇形同步及实时性能等多项指标上均展现出显著提升。除生成稳定且视觉愉悦的合成视频外,本方法还可输出动态网格,有望支撑众多新兴应用。