This work presents HeadArtist for 3D head generation from text descriptions. With a landmark-guided ControlNet serving as the generative prior, we come up with an efficient pipeline that optimizes a parameterized 3D head model under the supervision of the prior distillation itself. We call such a process self score distillation (SSD). In detail, given a sampled camera pose, we first render an image and its corresponding landmarks from the head model, and add some particular level of noise onto the image. The noisy image, landmarks, and text condition are then fed into the frozen ControlNet twice for noise prediction. Two different classifier-free guidance (CFG) weights are applied during these two predictions, and the prediction difference offers a direction on how the rendered image can better match the text of interest. Experimental results suggest that our approach delivers high-quality 3D head sculptures with adequate geometry and photorealistic appearance, significantly outperforming state-ofthe-art methods. We also show that the same pipeline well supports editing the generated heads, including both geometry deformation and appearance change.
翻译:本文提出HeadArtist方法,用于从文本描述生成3D头部模型。以地标引导的ControlNet作为生成先验,我们设计了一种高效流程,在先验蒸馏自身的监督下优化参数化3D头部模型。我们将此过程称为自评分蒸馏(SSD)。具体而言,给定采样相机姿态后,首先从头部模型渲染图像及其对应地标,并向图像添加特定噪声水平。随后,将含噪图像、地标和文本条件输入冻结的ControlNet进行两次噪声预测。两次预测过程中分别应用不同的无分类器引导(CFG)权重,其预测差异指明了渲染图像如何更好地匹配目标文本。实验结果表明,我们的方法能生成具有完整几何结构和逼真外观的高质量3D头部雕塑,显著优于现有最优方法。我们还证明,该流程同样支持对生成头部进行编辑,包括几何变形和外观变化。