In the field of digital content creation, generating high-quality 3D characters from single images is challenging, especially given the complexities of various body poses and the issues of self-occlusion and pose ambiguity. In this paper, we present CharacterGen, a framework developed to efficiently generate 3D characters. CharacterGen introduces a streamlined generation pipeline along with an image-conditioned multi-view diffusion model. This model effectively calibrates input poses to a canonical form while retaining key attributes of the input image, thereby addressing the challenges posed by diverse poses. A transformer-based, generalizable sparse-view reconstruction model is the other core component of our approach, facilitating the creation of detailed 3D models from multi-view images. We also adopt a texture-back-projection strategy to produce high-quality texture maps. Additionally, we have curated a dataset of anime characters, rendered in multiple poses and views, to train and evaluate our model. Our approach has been thoroughly evaluated through quantitative and qualitative experiments, showing its proficiency in generating 3D characters with high-quality shapes and textures, ready for downstream applications such as rigging and animation.
翻译:在数字内容创作领域,从单张图像生成高质量三维角色具有挑战性,尤其是考虑到各种身体姿态的复杂性以及自遮挡和姿态模糊性问题。本文提出CharacterGen,一个旨在高效生成三维角色的框架。CharacterGen引入了一个简化的生成流程以及一个图像条件化的多视角扩散模型。该模型能有效地将输入姿态校准至规范形式,同时保留输入图像的关键属性,从而应对多样姿态带来的挑战。一个基于Transformer的、可泛化的稀疏视角重建模型是我们方法的另一核心组件,它促进了从多视角图像创建精细三维模型的过程。我们还采用了纹理反向投影策略以生成高质量的纹理贴图。此外,我们构建了一个动漫角色数据集,该数据集以多种姿态和视角渲染,用于训练和评估我们的模型。通过定量和定性实验对我们的方法进行了全面评估,结果表明其能够熟练地生成具有高质量形状和纹理的三维角色,可直接用于骨骼绑定和动画等下游应用。