We introduce SimAvatar, a framework designed to generate simulation-ready clothed 3D human avatars from a text prompt. Current text-driven human avatar generation methods either model hair, clothing, and the human body using a unified geometry or produce hair and garments that are not easily adaptable for simulation within existing simulation pipelines. The primary challenge lies in representing the hair and garment geometry in a way that allows leveraging established prior knowledge from foundational image diffusion models (e.g., Stable Diffusion) while being simulation-ready using either physics or neural simulators. To address this task, we propose a two-stage framework that combines the flexibility of 3D Gaussians with simulation-ready hair strands and garment meshes. Specifically, we first employ three text-conditioned 3D generative models to generate garment mesh, body shape and hair strands from the given text prompt. To leverage prior knowledge from foundational diffusion models, we attach 3D Gaussians to the body mesh, garment mesh, as well as hair strands and learn the avatar appearance through optimization. To drive the avatar given a pose sequence, we first apply physics simulators onto the garment meshes and hair strands. We then transfer the motion onto 3D Gaussians through carefully designed mechanisms for each body part. As a result, our synthesized avatars have vivid texture and realistic dynamic motion. To the best of our knowledge, our method is the first to produce highly realistic, fully simulation-ready 3D avatars, surpassing the capabilities of current approaches.
翻译:本文提出SimAvatar框架,该框架旨在通过文本提示生成可直接用于仿真的着装三维人体虚拟人。当前基于文本的虚拟人生成方法要么采用统一几何模型表示头发、服装与人体,要么生成的头发与服装难以适配现有仿真流程进行模拟。核心挑战在于如何表示头发与服装的几何形态,使其既能利用基础图像扩散模型(如Stable Diffusion)的既有先验知识,又能直接适用于物理或神经仿真器。为解决该问题,我们提出一种两阶段框架,将三维高斯模型的灵活性与仿真就绪的发丝及服装网格相结合。具体而言,我们首先采用三个文本条件化的三维生成模型,根据给定文本提示生成服装网格、人体形态及发丝。为利用基础扩散模型的先验知识,我们在人体网格、服装网格及发丝表面附着三维高斯模型,并通过优化学习虚拟人的外观表现。在给定姿态序列驱动虚拟人时,我们首先对服装网格和发丝施加物理仿真,随后通过为每个身体部位精心设计的运动传递机制,将运动状态映射至三维高斯模型。最终,我们合成的虚拟人具备生动的纹理细节与逼真的动态运动效果。据我们所知,本方法是首个能够生成高度逼真、完全仿真就绪三维虚拟人的技术,其性能超越了现有方法。