Generating 3D human models directly from text helps reduce the cost and time of character modeling. However, achieving multi-attribute controllable and realistic 3D human avatar generation is still challenging due to feature coupling and the scarcity of realistic 3D human avatar datasets. To address these issues, we propose Text2Avatar, which can generate realistic-style 3D avatars based on the coupled text prompts. Text2Avatar leverages a discrete codebook as an intermediate feature to establish a connection between text and avatars, enabling the disentanglement of features. Furthermore, to alleviate the scarcity of realistic style 3D human avatar data, we utilize a pre-trained unconditional 3D human avatar generation model to obtain a large amount of 3D avatar pseudo data, which allows Text2Avatar to achieve realistic style generation. Experimental results demonstrate that our method can generate realistic 3D avatars from coupled textual data, which is challenging for other existing methods in this field.
翻译:从文本直接生成3D人体模型有助于降低角色建模的成本与时间。然而,由于特征耦合以及现实风格3D人体虚拟化身数据集的稀缺,实现多属性可控且逼真的3D人体虚拟化身生成仍具挑战性。为解决上述问题,我们提出了Text2Avatar,该方法能够基于耦合的文本提示生成现实风格的3D虚拟化身。Text2Avatar利用离散码本作为中间特征,在文本与虚拟化身之间建立关联,从而实现特征解耦。此外,为缓解现实风格3D人体虚拟化身数据的稀缺性,我们采用预训练的无条件3D人体虚拟化身生成模型获取大量3D虚拟化身伪数据,这使得Text2Avatar能够实现现实风格生成。实验结果表明,我们的方法能够从耦合文本数据中生成逼真的3D虚拟化身,而其他现有方法在该领域难以实现这一目标。