In the field of digital content creation, generating high-quality 3D characters from single images is challenging, especially given the complexities of various body poses and the issues of self-occlusion and pose ambiguity. In this paper, we present CharacterGen, a framework developed to efficiently generate 3D characters. CharacterGen introduces a streamlined generation pipeline along with an image-conditioned multi-view diffusion model. This model effectively calibrates input poses to a canonical form while retaining key attributes of the input image, thereby addressing the challenges posed by diverse poses. A transformer-based, generalizable sparse-view reconstruction model is the other core component of our approach, facilitating the creation of detailed 3D models from multi-view images. We also adopt a texture-back-projection strategy to produce high-quality texture maps. Additionally, we have curated a dataset of anime characters, rendered in multiple poses and views, to train and evaluate our model. Our approach has been thoroughly evaluated through quantitative and qualitative experiments, showing its proficiency in generating 3D characters with high-quality shapes and textures, ready for downstream applications such as rigging and animation.
翻译:在数字内容创作领域,从单张图像生成高质量三维角色极具挑战性,尤其需应对多种身体姿态的复杂性、自遮挡问题及姿态歧义性。本文提出CharacterGen框架,旨在高效生成三维角色。该框架包含精简的生成管线与图像条件多视角扩散模型——该模型能将输入姿态有效校准至规范形式,同时保留输入图像的关键特征,从而解决多样化姿态带来的难题。方法另一核心组件是基于Transformer的通用稀疏视角重建模型,可基于多视角图像构建精细三维模型。此外,我们采用纹理反向投影策略生成高质量纹理贴图。为训练与评估模型,我们还构建了包含多姿态多视角渲染的动漫角色数据集。通过定性与定量实验的综合评估,本方法在生成具有高质量几何与纹理的三维角色方面展现出优异性能,可直接应用于骨骼绑定与动画等下游任务。