We introduce SimAvatar, a framework designed to generate simulation-ready clothed 3D human avatars from a text prompt. Current text-driven human avatar generation methods either model hair, clothing, and the human body using a unified geometry or produce hair and garments that are not easily adaptable for simulation within existing simulation pipelines. The primary challenge lies in representing the hair and garment geometry in a way that allows leveraging established prior knowledge from foundational image diffusion models (e.g., Stable Diffusion) while being simulation-ready using either physics or neural simulators. To address this task, we propose a two-stage framework that combines the flexibility of 3D Gaussians with simulation-ready hair strands and garment meshes. Specifically, we first employ three text-conditioned 3D generative models to generate garment mesh, body shape and hair strands from the given text prompt. To leverage prior knowledge from foundational diffusion models, we attach 3D Gaussians to the body mesh, garment mesh, as well as hair strands and learn the avatar appearance through optimization. To drive the avatar given a pose sequence, we first apply physics simulators onto the garment meshes and hair strands. We then transfer the motion onto 3D Gaussians through carefully designed mechanisms for each body part. As a result, our synthesized avatars have vivid texture and realistic dynamic motion. To the best of our knowledge, our method is the first to produce highly realistic, fully simulation-ready 3D avatars, surpassing the capabilities of current approaches.
翻译:本文提出SimAvatar框架,该框架旨在通过文本提示生成可直接用于仿真的着装三维人体虚拟化身。当前基于文本的人体虚拟化身生成方法要么使用统一几何结构建模头发、服装与人体,要么生成的头发与服装难以适配现有仿真流程。核心挑战在于如何以既能利用基础图像扩散模型(如Stable Diffusion)的先验知识,又能直接适配物理或神经仿真的方式表示头发与服装的几何结构。针对该任务,我们提出一个两阶段框架,将三维高斯的灵活性与仿真就绪的发丝及服装网格相结合。具体而言,我们首先采用三个文本条件化的三维生成模型,根据给定文本提示生成服装网格、人体形态及发丝。为利用基础扩散模型的先验知识,我们在人体网格、服装网格及发丝表面附着三维高斯单元,并通过优化学习虚拟化身的外观表现。为驱动虚拟化身响应姿态序列,我们首先对服装网格与发丝施加物理仿真,随后通过为各身体部位精心设计的运动传递机制,将运动状态迁移至三维高斯单元。最终,我们合成的虚拟化身具备生动的纹理与逼真的动态运动效果。据我们所知,本方法是首个能够生成高度逼真、完全仿真就绪三维虚拟化身的技术,其性能超越了现有方法。