We present StdGEN, an innovative pipeline for generating semantically decomposed high-quality 3D characters from single images, enabling broad applications in virtual reality, gaming, and filmmaking, etc. Unlike previous methods which struggle with limited decomposability, unsatisfactory quality, and long optimization times, StdGEN features decomposability, effectiveness and efficiency; i.e., it generates intricately detailed 3D characters with separated semantic components such as the body, clothes, and hair, in three minutes. At the core of StdGEN is our proposed Semantic-aware Large Reconstruction Model (S-LRM), a transformer-based generalizable model that jointly reconstructs geometry, color and semantics from multi-view images in a feed-forward manner. A differentiable multi-layer semantic surface extraction scheme is introduced to acquire meshes from hybrid implicit fields reconstructed by our S-LRM. Additionally, a specialized efficient multi-view diffusion model and an iterative multi-layer surface refinement module are integrated into the pipeline to facilitate high-quality, decomposable 3D character generation. Extensive experiments demonstrate our state-of-the-art performance in 3D anime character generation, surpassing existing baselines by a significant margin in geometry, texture and decomposability. StdGEN offers ready-to-use semantic-decomposed 3D characters and enables flexible customization for a wide range of applications. Project page: https://stdgen.github.io
翻译:我们提出了StdGEN,一种从单张图像生成语义分解高质量三维角色的创新流程,可广泛应用于虚拟现实、游戏和电影制作等领域。与以往方法在分解能力有限、生成质量欠佳及优化时间过长等方面的不足相比,StdGEN兼具可分解性、高效性与有效性:它能在三分钟内生成具有精细细节且语义部件(如身体、服装、头发)分离的三维角色。StdGEN的核心是我们提出的语义感知大型重建模型(S-LRM),这是一种基于Transformer的可泛化模型,以前馈方式从多视角图像联合重建几何、颜色与语义信息。我们引入了可微分多层语义表面提取方案,以从S-LRM重建的混合隐式场中获取网格模型。此外,流程中还集成了专门设计的高效多视角扩散模型与迭代式多层表面优化模块,以促进高质量、可分解的三维角色生成。大量实验证明,我们在三维动漫角色生成任务中取得了最先进的性能,在几何质量、纹理细节和可分解性方面显著超越现有基线方法。StdGEN可直接生成即用型语义分解三维角色,并为广泛的应用场景提供灵活的定制能力。项目页面:https://stdgen.github.io