We present DreamHuman, a method to generate realistic animatable 3D human avatar models solely from textual descriptions. Recent text-to-3D methods have made considerable strides in generation, but are still lacking in important aspects. Control and often spatial resolution remain limited, existing methods produce fixed rather than animated 3D human models, and anthropometric consistency for complex structures like people remains a challenge. DreamHuman connects large text-to-image synthesis models, neural radiance fields, and statistical human body models in a novel modeling and optimization framework. This makes it possible to generate dynamic 3D human avatars with high-quality textures and learned, instance-specific, surface deformations. We demonstrate that our method is capable to generate a wide variety of animatable, realistic 3D human models from text. Our 3D models have diverse appearance, clothing, skin tones and body shapes, and significantly outperform both generic text-to-3D approaches and previous text-based 3D avatar generators in visual fidelity. For more results and animations please check our website at https://dream-human.github.io.
翻译:我们提出了DreamHuman,一种仅通过文本描述即可生成真实可动画三维人类化身模型的方法。最近的文本到三维方法在生成方面取得了显著进展,但仍在重要维度上存在不足。控制能力与空间分辨率通常受限,现有方法生成的是固定而非可动画的三维人体模型,且对于人等复杂结构的人体测量一致性仍是挑战。DreamHuman将大规模文本到图像合成模型、神经辐射场与统计人体模型连接至一个新颖的建模与优化框架中。这使得能够生成具有高质量纹理及经学习得到的实例特定表面形变的动态三维人类化身。我们证明该方法可从文本生成多种多样的可动画、真实感三维人体模型。我们的三维模型具有多样的外观、服装、肤色与体型,在视觉逼真度上显著优于通用文本到三维方法及先前的基于文本的三维化身生成器。更多结果与动画请访问我们的网站 https://dream-human.github.io。