We present StdGEN++, a novel and comprehensive system for generating high-fidelity, semantically decomposed 3D characters from diverse inputs. Existing 3D generative methods often produce monolithic meshes that lack the structural flexibility required by industrial pipelines in gaming and animation. Addressing this gap, StdGEN++ is built upon a Dual-branch Semantic-aware Large Reconstruction Model (Dual-Branch S-LRM), which jointly reconstructs geometry, color, and per-component semantics in a feed-forward manner. To achieve production-level fidelity, we introduce a novel semantic surface extraction formalism compatible with hybrid implicit fields. This mechanism is accelerated by a coarse-to-fine proposal scheme, which significantly reduces memory footprint and enables high-resolution mesh generation. Furthermore, we propose a video-diffusion-based texture decomposition module that disentangles appearance into editable layers (e.g., separated iris and skin), resolving semantic confusion in facial regions. Experiments demonstrate that StdGEN++ achieves state-of-the-art performance, significantly outperforming existing methods in geometric accuracy and semantic disentanglement. Crucially, the resulting structural independence unlocks advanced downstream capabilities, including non-destructive editing, physics-compliant animation, and gaze tracking, making it a robust solution for automated character asset production.
翻译:本文提出StdGEN++,一种新颖且全面的系统,用于从多样化输入生成高保真、语义分解的三维角色。现有的三维生成方法通常产生单一网格,缺乏游戏和动画工业流程所需的结构灵活性。为填补这一空白,StdGEN++构建于双分支语义感知大型重建模型之上,以前馈方式联合重建几何、颜色及逐组件语义。为实现生产级保真度,我们引入一种与混合隐式场兼容的新型语义表面提取形式化方法。该机制通过由粗到精的提议方案加速,显著降低内存占用并实现高分辨率网格生成。此外,我们提出基于视频扩散的纹理分解模块,将外观解耦为可编辑层,解决面部区域的语义混淆问题。实验表明,StdGEN++实现了最先进的性能,在几何精度和语义解耦方面显著优于现有方法。至关重要的是,所得的结构独立性解锁了高级下游能力,包括非破坏性编辑、物理合规动画与视线追踪,使其成为自动化角色资产生产的稳健解决方案。