一种用于联合生成CG就绪人体与兼容服装的文本到3D框架 (A Text-to-3D Framework for Joint Generation of CG-Ready Humans and Compatible Garments)

Creating detailed 3D human avatars with fitted garments traditionally requires specialized expertise and labor-intensive workflows. While recent advances in generative AI have enabled text-to-3D human and clothing synthesis, existing methods fall short in offering accessible, integrated pipelines for generating CG-ready 3D avatars with physically compatible outfits; here we use the term CG-ready for models following a technical aesthetic common in computer graphics (CG) and adopt standard CG polygonal meshes and strands representations (rather than neural representations like NeRF and 3DGS) that can be directly integrated into conventional CG pipelines and support downstream tasks such as physical simulation. To bridge this gap, we introduce Tailor, an integrated text-to-3D framework that generates high-fidelity, customizable 3D avatars dressed in simulation-ready garments. Tailor consists of three stages. (1) Seman tic Parsing: we employ a large language model to interpret textual descriptions and translate them into parameterized human avatars and semantically matched garment templates. (2) Geometry-Aware Garment Generation: we propose topology-preserving deformation with novel geometric losses to generate body-aligned garments under text control. (3) Consistent Texture Synthesis: we propose a novel multi-view diffusion process optimized for garment texturing, which enforces view consistency, preserves photorealistic details, and optionally supports symmetric texture generation common in garments. Through comprehensive quantitative and qualitative evaluations, we demonstrate that Tailor outperforms state-of-the-art methods in fidelity, usability, and diversity. Our code will be released for academic use. Project page: https://human-tailor.github.io

翻译：传统上，创建带有合身服装的精细3D人体化身需要专业知识和劳动密集型工作流程。尽管生成式AI的最新进展已实现文本到3D人体与服装的合成，但现有方法在提供可访问的、集成的流程以生成具有物理兼容服装的CG就绪3D化身方面仍存在不足；此处我们使用术语“CG就绪”指代遵循计算机图形学（CG）中常见技术美学的模型，并采用标准CG多边形网格与发束表示（而非NeRF和3DGS等神经表示），这些表示可直接集成到传统CG流程中，并支持物理模拟等下游任务。为弥补这一差距，我们提出了Tailor，一种集成的文本到3D框架，可生成高保真、可定制的3D化身，并为其着装模拟就绪的服装。Tailor包含三个阶段。（1）语义解析：我们采用大型语言模型解释文本描述，并将其转换为参数化人体化身和语义匹配的服装模板。（2）几何感知服装生成：我们提出具有新颖几何损失的拓扑保持变形方法，在文本控制下生成与身体对齐的服装。（3）一致纹理合成：我们提出一种专为服装纹理化优化的新颖多视角扩散过程，该过程强制视角一致性，保持照片级真实感细节，并可选择支持服装中常见的对称纹理生成。通过全面的定量与定性评估，我们证明Tailor在保真度、可用性和多样性方面优于现有最先进方法。我们的代码将开源供学术使用。项目页面：https://human-tailor.github.io