We present DreamAvatar, a text-and-shape guided framework for generating high-quality 3D human avatars with controllable poses. While encouraging results have been produced by recent methods on text-guided 3D common object generation, generating high-quality human avatars remains an open challenge due to the complexity of the human body's shape, pose, and appearance. We propose DreamAvatar to tackle this challenge, which utilizes a trainable NeRF for predicting density and color features for 3D points and a pre-trained text-to-image diffusion model for providing 2D self-supervision. Specifically, we leverage SMPL models to provide rough pose and shape guidance for the generation. We introduce a dual space design that comprises a canonical space and an observation space, which are related by a learnable deformation field through the NeRF, allowing for the transfer of well-optimized texture and geometry from the canonical space to the target posed avatar. Additionally, we exploit a normal-consistency regularization to allow for more vivid generation with detailed geometry and texture. Through extensive evaluations, we demonstrate that DreamAvatar significantly outperforms existing methods, establishing a new state-of-the-art for text-and-shape guided 3D human generation.
翻译:我们提出DreamAvatar,一种基于文本与形状引导的框架,用于生成具有可控姿态的高质量三维人体化身。尽管现有方法在文本引导的三维通用物体生成方面已取得令人鼓舞的成果,但由于人体形状、姿态和外观的复杂性,生成高质量人体化身仍是一项未解决的挑战。我们提出DreamAvatar以应对这一难题,该方法利用可训练的神经辐射场(NeRF)预测三维点的密度与颜色特征,并结合预训练的文本到图像扩散模型提供二维自监督信号。具体而言,我们借助SMPL模型提供粗略姿态与形状引导。我们引入双空间设计,包括规范空间与观测空间,两者通过NeRF中的可学习变形场建立联系,从而将规范空间中充分优化的纹理与几何迁移至目标姿态的化身。此外,我们利用法向一致性正则化实现更逼真的生成,包含精细几何与纹理。通过广泛评估,我们证明DreamAvatar显著优于现有方法,在文本与形状引导的三维人体生成领域开创了新的最优水平。