Powered by large-scale text-to-image generation models, text-to-3D avatar generation has made promising progress. However, most methods fail to produce photorealistic results, limited by imprecise geometry and low-quality appearance. Towards more practical avatar generation, we present SEEAvatar, a method for generating photorealistic 3D avatars from text with SElf-Evolving constraints for decoupled geometry and appearance. For geometry, we propose to constrain the optimized avatar in a decent global shape with a template avatar. The template avatar is initialized with human prior and can be updated by the optimized avatar periodically as an evolving template, which enables more flexible shape generation. Besides, the geometry is also constrained by the static human prior in local parts like face and hands to maintain the delicate structures. For appearance generation, we use diffusion model enhanced by prompt engineering to guide a physically based rendering pipeline to generate realistic textures. The lightness constraint is applied on the albedo texture to suppress incorrect lighting effect. Experiments show that our method outperforms previous methods on both global and local geometry and appearance quality by a large margin. Since our method can produce high-quality meshes and textures, such assets can be directly applied in classic graphics pipeline for realistic rendering under any lighting condition. Project page at: https://yoxu515.github.io/SEEAvatar/.
翻译:借助大规模文本到图像生成模型,文本到三维人像生成技术已取得显著进展。然而,现有方法受限于不精确的几何结构和低质量外观,难以生成光真实感结果。为推进更具实用性的虚拟人像生成,本文提出SEEAvatar——一种通过自进化约束解耦几何与外观的文本驱动光真实感三维人像生成方法。在几何建模方面,我们采用模板人像约束优化后的人像保持合理的全局形态。该模板人像基于人体先验初始化,并可通过优化后的人像周期性更新形成进化模板,从而支持更灵活的形状生成。此外,几何结构还通过面部、手部等局部区域的静态人体先验约束,以维持精细结构。在外观生成方面,我们采用经提示工程增强的扩散模型引导基于物理的渲染管线,生成逼真纹理,并对反照率纹理施加光照约束以抑制错误光照效应。实验表明,本方法在全局与局部几何及外观质量上均大幅超越现有方法。由于能生成高质量网格模型与纹理,该资产可直接应用于经典图形渲染管线,实现任意光照条件下的真实感渲染。项目主页:https://yoxu515.github.io/SEEAvatar/