Creating digital avatars from textual prompts has long been a desirable yet challenging task. Despite the promising outcomes obtained through 2D diffusion priors in recent works, current methods face challenges in achieving high-quality and animated avatars effectively. In this paper, we present $\textbf{HeadStudio}$, a novel framework that utilizes 3D Gaussian splatting to generate realistic and animated avatars from text prompts. Our method drives 3D Gaussians semantically to create a flexible and achievable appearance through the intermediate FLAME representation. Specifically, we incorporate the FLAME into both 3D representation and score distillation: 1) FLAME-based 3D Gaussian splatting, driving 3D Gaussian points by rigging each point to a FLAME mesh. 2) FLAME-based score distillation sampling, utilizing FLAME-based fine-grained control signal to guide score distillation from the text prompt. Extensive experiments demonstrate the efficacy of HeadStudio in generating animatable avatars from textual prompts, exhibiting visually appealing appearances. The avatars are capable of rendering high-quality real-time ($\geq 40$ fps) novel views at a resolution of 1024. They can be smoothly controlled by real-world speech and video. We hope that HeadStudio can advance digital avatar creation and that the present method can widely be applied across various domains.
翻译:通过文本提示创建数字化身一直是一个令人向往但充满挑战的任务。尽管近期研究借助二维扩散先验取得了令人鼓舞的成果,但现有方法在高效生成高质量可动画化身方面仍面临困难。本文提出$\textbf{HeadStudio}$——一个利用三维高斯溅射从文本提示生成逼真可动画化身的新型框架。该方法通过中间FLAME表示语义驱动三维高斯点,实现灵活且可实现的视觉效果。具体而言,我们将FLAME同时融入三维表示与分数蒸馏过程:1)基于FLAME的三维高斯溅射,通过将每个高斯点绑定至FLAME网格实现驱动;2)基于FLAME的分数蒸馏采样,利用FLAME细粒度控制信号引导文本提示的分数蒸馏。大量实验证明HeadStudio在从文本提示生成可动画化身方面的有效性,生成的化身具有视觉吸引力,能够以1024分辨率实时($\geq 40$帧/秒)渲染高质量新视角,并可被真实语音与视频流畅控制。我们期望HeadStudio能推动数字化身创作发展,并广泛适用于各类应用场景。