Current text-to-avatar methods often rely on implicit representations (e.g., NeRF, SDF, and DMTet), leading to 3D content that artists cannot easily edit and animate in graphics software. This paper introduces a novel framework for generating stylized head avatars from text guidance, which leverages locally learnable mesh deformation and 2D diffusion priors to achieve high-quality digital assets for attribute-preserving manipulation. Given a template mesh, our method represents mesh deformation with per-face Jacobians and adaptively modulates local deformation using a learnable vector field. This vector field enables anisotropic scaling while preserving the rotation of vertices, which can better express identity and geometric details. We employ landmark- and contour-based regularization terms to balance the expressiveness and plausibility of generated avatars from multiple views without relying on any specific shape prior. Our framework can generate realistic shapes and textures that can be further edited via text, while supporting seamless editing using the preserved attributes from the template mesh, such as 3DMM parameters, blendshapes, and UV coordinates. Extensive experiments demonstrate that our framework can generate diverse and expressive head avatars with high-quality meshes that artists can easily manipulate in graphics software, facilitating downstream applications such as efficient asset creation and animation with preserved attributes.
翻译:当前基于文本的头像生成方法通常依赖于隐式表示(如NeRF、SDF和DMTet),导致生成的3D内容难以被艺术家在图形软件中直接编辑与动画化。本文提出一种新颖的框架,通过文本引导生成风格化头部头像,该方法利用局部可学习的网格变形与2D扩散先验,以生成适用于属性保持操作的高质量数字资产。给定模板网格,我们的方法通过逐面雅可比矩阵表示网格变形,并利用可学习向量场自适应调制局部变形。该向量场在保持顶点旋转的同时实现各向异性缩放,从而更好地表达身份特征与几何细节。我们采用基于面部关键点与轮廓的正则化项,在不依赖任何特定形状先验的情况下,从多视角平衡生成头像的表达力与合理性。本框架能够生成具有真实感的形状与纹理,并支持通过文本进行进一步编辑,同时可利用模板网格中保留的属性(如3DMM参数、混合形状与UV坐标)实现无缝编辑。大量实验表明,本框架能够生成多样化且富有表现力的头部头像,其高质量网格便于艺术家在图形软件中直接操控,为下游应用(如基于保留属性的高效资产创建与动画制作)提供了便利。