Next3D: Generative Neural Texture Rasterization for 3D-Aware Head Avatars

3D-aware generative adversarial networks (GANs) synthesize high-fidelity and multi-view-consistent facial images using only collections of single-view 2D imagery. Towards fine-grained control over facial attributes, recent efforts incorporate 3D Morphable Face Model (3DMM) to describe deformation in generative radiance fields either explicitly or implicitly. Explicit methods provide fine-grained expression control but cannot handle topological changes caused by hair and accessories, while implicit ones can model varied topologies but have limited generalization caused by the unconstrained deformation fields. We propose a novel 3D GAN framework for unsupervised learning of generative, high-quality and 3D-consistent facial avatars from unstructured 2D images. To achieve both deformation accuracy and topological flexibility, we propose a 3D representation called Generative Texture-Rasterized Tri-planes. The proposed representation learns Generative Neural Textures on top of parametric mesh templates and then projects them into three orthogonal-viewed feature planes through rasterization, forming a tri-plane feature representation for volume rendering. In this way, we combine both fine-grained expression control of mesh-guided explicit deformation and the flexibility of implicit volumetric representation. We further propose specific modules for modeling mouth interior which is not taken into account by 3DMM. Our method demonstrates state-of-the-art 3D-aware synthesis quality and animation ability through extensive experiments. Furthermore, serving as 3D prior, our animatable 3D representation boosts multiple applications including one-shot facial avatars and 3D-aware stylization.

翻译：三维感知生成对抗网络（GANs）仅利用单视图二维图像集合即可合成高保真且多视图一致的面部图像。为实现对面部属性的精细控制，近期研究通过显式或隐式方式将三维可变形人脸模型（3DMM）引入生成式辐射场以描述形变。显式方法提供精细表情控制但无法处理头发和配饰引起的拓扑变化，而隐式方法可建模多样拓扑但受无约束形变场限制导致泛化能力有限。本文提出一种新颖的三维GAN框架，用于从非结构化二维图像中无监督学习生成高质量、三维一致的面部虚拟形象。为兼顾形变精度与拓扑灵活性，我们提出名为"生成式纹理光栅化三平面"的三维表示方法。该表示在参数化网格模板上学习生成式神经纹理，通过光栅化将其投影至三个正交视角的特征平面，形成用于体渲染的三平面特征表示。由此融合了网格引导显式形变的精细表情控制能力与隐式体表示的灵活性。此外，针对3DMM未考虑的嘴部内部结构，我们设计了专用模块。大量实验表明，本方法在三维感知合成质量与动画能力上均达到最优水平。进一步地，作为三维先验，我们的可动画三维表示可促进单样本面部虚拟形象构建与三维感知风格化等多类应用。