This work presents HeadArtist for 3D head generation from text descriptions. With a landmark-guided ControlNet serving as the generative prior, we come up with an efficient pipeline that optimizes a parameterized 3D head model under the supervision of the prior distillation itself. We call such a process self score distillation (SSD). In detail, given a sampled camera pose, we first render an image and its corresponding landmarks from the head model, and add some particular level of noise onto the image. The noisy image, landmarks, and text condition are then fed into the frozen ControlNet twice for noise prediction. Two different classifier-free guidance (CFG) weights are applied during these two predictions, and the prediction difference offers a direction on how the rendered image can better match the text of interest. Experimental results suggest that our approach delivers high-quality 3D head sculptures with adequate geometry and photorealistic appearance, significantly outperforming state-ofthe-art methods. We also show that the same pipeline well supports editing the generated heads, including both geometry deformation and appearance change.
翻译:本文提出HeadArtist方法,实现从文本描述生成三维头部。通过以地标引导的ControlNet作为生成先验,我们构建了一个高效流程,在自蒸馏先验的监督下优化参数化三维头部模型。我们将此过程称为自分数蒸馏(SSD)。具体而言,给定采样相机视角,首先从头部模型渲染图像及其对应地标,并对图像添加特定噪声等级。随后将加噪图像、地标及文本条件两次输入冻结的ControlNet进行噪声预测。两次预测中采用不同的无分类器引导(CFG)权重,其预测差异为渲染图像如何更好地匹配目标文本提供方向。实验结果表明,本方法能生成具有充足几何细节与逼真外观的高质量三维头部雕塑,显著优于现有最优方法。我们还证明该流程能有效支持生成头部的编辑,包括几何形变与外观变化。