Text-driven avatar generation has gained significant attention owing to its convenience. However, existing methods typically model the human body with all garments as a single 3D model, limiting its usability, such as clothing replacement, and reducing user control over the generation process. To overcome the limitations above, we propose DAGSM, a novel pipeline that generates disentangled human bodies and garments from the given text prompts. Specifically, we model each part (e.g., body, upper/lower clothes) of the clothed human as one GS-enhanced mesh (GSM), which is a traditional mesh attached with 2D Gaussians to better handle complicated textures (e.g., woolen, translucent clothes) and produce realistic cloth animations. During the generation, we first create the unclothed body, followed by a sequence of individual cloth generation based on the body, where we introduce a semantic-based algorithm to achieve better human-cloth and garment-garment separation. To improve texture quality, we propose a view-consistent texture refinement module, including a cross-view attention mechanism for texture style consistency and an incident-angle-weighted denoising (IAW-DE) strategy to update the appearance. Extensive experiments have demonstrated that DAGSM generates high-quality disentangled avatars, supports clothing replacement and realistic animation, and outperforms the baselines in visual quality.
翻译:文本驱动的虚拟人生成因其便捷性而受到广泛关注。然而,现有方法通常将身着服装的人体整体建模为单一三维模型,这限制了其可用性(例如服装替换),并降低了用户对生成过程的控制。为克服上述局限,本文提出DAGSM——一种从给定文本提示生成解耦人体与服装的创新流程。具体而言,我们将着装人体的每个部分(如身体、上/下装)建模为一个GS增强网格(GSM),即在传统网格表面附着二维高斯分布以更好地处理复杂纹理(如羊毛材质、半透明衣物)并生成逼真的布料动画。在生成过程中,我们首先生成未着装的人体,随后基于该人体依次生成独立服装部件,并提出基于语义的算法以实现更优的人体-服装及服装部件间的分离效果。为提升纹理质量,我们设计了视图一致的纹理优化模块,包含用于保持纹理风格一致性的跨视图注意力机制,以及通过入射角加权去噪(IAW-DE)策略更新外观的方法。大量实验表明,DAGSM能生成高质量的解耦虚拟人,支持服装替换与逼真动画,并在视觉质量上超越基线方法。