DreamFace: Progressive Generation of Animatable 3D Faces under Text Guidance

from arxiv, Go to DreamFace project page https://sites.google.com/view/dreamface watch our video at https://youtu.be/yCuvzgGMvPM and experience DreamFace online at https://hyperhuman.top

Emerging Metaverse applications demand accessible, accurate, and easy-to-use tools for 3D digital human creations in order to depict different cultures and societies as if in the physical world. Recent large-scale vision-language advances pave the way to for novices to conveniently customize 3D content. However, the generated CG-friendly assets still cannot represent the desired facial traits for human characteristics. In this paper, we present DreamFace, a progressive scheme to generate personalized 3D faces under text guidance. It enables layman users to naturally customize 3D facial assets that are compatible with CG pipelines, with desired shapes, textures, and fine-grained animation capabilities. From a text input to describe the facial traits, we first introduce a coarse-to-fine scheme to generate the neutral facial geometry with a unified topology. We employ a selection strategy in the CLIP embedding space, and subsequently optimize both the details displacements and normals using Score Distillation Sampling from generic Latent Diffusion Model. Then, for neutral appearance generation, we introduce a dual-path mechanism, which combines the generic LDM with a novel texture LDM to ensure both the diversity and textural specification in the UV space. We also employ a two-stage optimization to perform SDS in both the latent and image spaces to significantly provides compact priors for fine-grained synthesis. Our generated neutral assets naturally support blendshapes-based facial animations. We further improve the animation ability with personalized deformation characteristics by learning the universal expression prior using the cross-identity hypernetwork. Notably, DreamFace can generate of realistic 3D facial assets with physically-based rendering quality and rich animation ability from video footage, even for fashion icons or exotic characters in cartoons and fiction movies.

翻译：新兴的元宇宙应用需要易于获取、精准且易用的3D数字人创建工具，以描绘如同物理世界中多元的文化与社会。近期大规模视觉-语言模型的进展为新手便捷定制3D内容铺平了道路。然而，生成的CG可兼容资产仍无法准确表达所需的人类面部特征。本文提出DreamFace——一种在文本引导下渐进式生成个性化3D人脸的方案。它使非专业用户能够自然地定制兼容CG流程的3D面部资产，且具备所需的形状、纹理及精细动画能力。针对描述面部特征的文本输入，我们首先引入从粗到细的生成方案，通过统一拓扑结构构建中性面部几何。该方法在CLIP嵌入空间中采用选择策略，进而利用通用潜在扩散模型的分数蒸馏采样（SDS）同时优化细节位移与法线。在中性外观生成阶段，我们提出双路径机制，结合通用LDM与新型纹理LDM，确保UV空间中的多样性及纹理规范性。同时采用两阶段优化，分别在潜在空间与图像空间执行SDS，为细粒度合成提供紧凑先验。生成的中间资产天然支持基于混合形状的面部动画。通过跨身份超网络学习通用表情先验，我们进一步提升了具有个性化变形特征的动画能力。值得注意的是，DreamFace能够根据视频素材生成具有基于物理渲染（PBR）质量及丰富动画能力的逼真3D面部资产，甚至可复现时装偶像或卡通/奇幻电影中的异域角色。