In this paper, we introduce a novel text-to-avatar generation method that separately generates the human body and the clothes and allows high-quality animation on the generated avatar. While recent advancements in text-to-avatar generation have yielded diverse human avatars from text prompts, these methods typically combine all elements-clothes, hair, and body-into a single 3D representation. Such an entangled approach poses challenges for downstream tasks like editing or animation. To overcome these limitations, we propose a novel disentangled 3D avatar representation named Sequentially Offset-SMPL (SO-SMPL), building upon the SMPL model. SO-SMPL represents the human body and clothes with two separate meshes but associates them with offsets to ensure the physical alignment between the body and the clothes. Then, we design a Score Distillation Sampling (SDS)-based distillation framework to generate the proposed SO-SMPL representation from text prompts. Our approach not only achieves higher texture and geometry quality and better semantic alignment with text prompts, but also significantly improves the visual quality of character animation, virtual try-on, and avatar editing. Project page: https://shanemankiw.github.io/SO-SMPL/.
翻译:本文提出了一种新颖的文本到虚拟人生成方法,该方法分别生成人体与服装,并支持对生成虚拟人进行高质量动画驱动。尽管当前文本到虚拟人生成技术的最新进展已能通过文本提示生成多样化的人体虚拟人,但这些方法通常将服装、头发和身体等所有元素融合为单一的三维表征。这种耦合式表征对后续编辑或动画等任务提出了挑战。为克服这些限制,我们在SMPL模型基础上提出了一种新型分离式三维虚拟人表征方法——Sequentially Offset-SMPL(SO-SMPL)。SO-SMPL使用两个独立网格分别表征人体与服装,同时通过偏移量关联两者以确保身体与服装的物理对齐。随后,我们设计了一个基于分数蒸馏采样(SDS)的蒸馏框架,用于从文本提示生成所提出的SO-SMPL表征。我们的方法不仅在纹理与几何质量上表现更优、与文本提示的语义对齐更好,还显著提升了角色动画、虚拟试穿和虚拟人编辑的视觉质量。项目页面:https://shanemankiw.github.io/SO-SMPL/。