We present a new method for multimodal conditional 3D face geometry generation that allows user-friendly control over the output identity and expression via a number of different conditioning signals. Within a single model, we demonstrate 3D faces generated from artistic sketches, 2D face landmarks, Canny edges, FLAME face model parameters, portrait photos, or text prompts. Our approach is based on a diffusion process that generates 3D geometry in a 2D parameterized UV domain. Geometry generation passes each conditioning signal through a set of cross-attention layers (IP-Adapter), one set for each user-defined conditioning signal. The result is an easy-to-use 3D face generation tool that produces high resolution geometry with fine-grain user control.
翻译:我们提出了一种新的多模态条件式三维人脸几何生成方法,该方法允许通过多种不同的条件信号对输出身份和表情进行用户友好型控制。在单一模型框架内,我们展示了从艺术草图、二维人脸关键点、Canny边缘、FLAME人脸模型参数、肖像照片或文本提示生成的三维人脸。我们的方法基于在二维参数化UV域中生成三维几何的扩散过程。几何生成模块将每个条件信号通过一组交叉注意力层(IP-Adapter)进行处理,每种用户定义的条件信号对应一组注意力层。最终构建出一个易于使用的三维人脸生成工具,能够生成高分辨率几何模型并实现细粒度用户控制。