We present DreamAvatar, a text-and-shape guided framework for generating high-quality 3D human avatars with controllable poses. While encouraging results have been reported by recent methods on text-guided 3D common object generation, generating high-quality human avatars remains an open challenge due to the complexity of the human body's shape, pose, and appearance. We propose DreamAvatar to tackle this challenge, which utilizes a trainable NeRF for predicting density and color for 3D points and pretrained text-to-image diffusion models for providing 2D self-supervision. Specifically, we leverage the SMPL model to provide shape and pose guidance for the generation. We introduce a dual-observation-space design that involves the joint optimization of a canonical space and a posed space that are related by a learnable deformation field. This facilitates the generation of more complete textures and geometry faithful to the target pose. We also jointly optimize the losses computed from the full body and from the zoomed-in 3D head to alleviate the common multi-face ''Janus'' problem and improve facial details in the generated avatars. Extensive evaluations demonstrate that DreamAvatar significantly outperforms existing methods, establishing a new state-of-the-art for text-and-shape guided 3D human avatar generation.
翻译:我们提出DreamAvatar,一个基于文本与形状引导的三维人体化身生成框架,能够生成具备可控姿态的高质量化身。尽管近年来在文本引导的三维通用物体生成方面取得了令人振奋的成果,但由于人体形状、姿态及外观的复杂性,生成高质量人体化身仍是一个待解决的重要挑战。我们提出DreamAvatar以应对这一挑战,该框架利用可训练的神经辐射场(NeRF)预测三维点的密度与颜色,并借助预训练的文本到图像扩散模型提供二维自监督信号。具体而言,我们采用SMPL模型提供形状与姿态引导。我们引入双观测空间设计,通过可学习的形变场关联规范空间与姿态空间并进行联合优化,从而促进生成更完整且忠实于目标姿态的纹理与几何结构。同时,我们联合优化全身与放大的三维头部损失,以缓解常见的多脸"Janus"问题并提升生成化身的面部细节。大量评估表明,DreamAvatar显著优于现有方法,为基于文本与形状引导的三维人体化身生成树立了新的标杆。