Portrait3D: Text-Guided High-Quality 3D Portrait Generation Using Pyramid Representation and GANs Prior

Existing neural rendering-based text-to-3D-portrait generation methods typically make use of human geometry prior and diffusion models to obtain guidance. However, relying solely on geometry information introduces issues such as the Janus problem, over-saturation, and over-smoothing. We present Portrait3D, a novel neural rendering-based framework with a novel joint geometry-appearance prior to achieve text-to-3D-portrait generation that overcomes the aforementioned issues. To accomplish this, we train a 3D portrait generator, 3DPortraitGAN-Pyramid, as a robust prior. This generator is capable of producing 360{\deg} canonical 3D portraits, serving as a starting point for the subsequent diffusion-based generation process. To mitigate the "grid-like" artifact caused by the high-frequency information in the feature-map-based 3D representation commonly used by most 3D-aware GANs, we integrate a novel pyramid tri-grid 3D representation into 3DPortraitGAN-Pyramid. To generate 3D portraits from text, we first project a randomly generated image aligned with the given prompt into the pre-trained 3DPortraitGAN-Pyramid's latent space. The resulting latent code is then used to synthesize a pyramid tri-grid. Beginning with the obtained pyramid tri-grid, we use score distillation sampling to distill the diffusion model's knowledge into the pyramid tri-grid. Following that, we utilize the diffusion model to refine the rendered images of the 3D portrait and then use these refined images as training data to further optimize the pyramid tri-grid, effectively eliminating issues with unrealistic color and unnatural artifacts. Our experimental results show that Portrait3D can produce realistic, high-quality, and canonical 3D portraits that align with the prompt.

翻译：现有基于神经渲染的文本到三维肖像生成方法通常利用人体几何先验和扩散模型获取引导。然而，仅依赖几何信息会导致"两面人"问题、过饱和及过度平滑等缺陷。本文提出Portrait3D——一种新颖的基于神经渲染的框架，通过创新的联合几何-外观先验实现文本到三维肖像生成，有效克服上述问题。为此，我们训练了一个名为3DPortraitGAN-Pyramid的三维肖像生成器作为稳健先验。该生成器能够生成360°标准三维肖像，为后续基于扩散的生成过程提供初始起点。为缓解多数三维感知GAN常用的基于特征图的三维表征因高频信息导致的"网格状"伪影，我们在3DPortraitGAN-Pyramid中集成了新颖的金字塔三网格三维表征。在文本生成三维肖像时，我们首先将与给定提示对齐的随机生成图像投影到预训练的3DPortraitGAN-Pyramid潜空间。所得潜码随后用于合成金字塔三网格。基于获取的金字塔三网格，我们采用分数蒸馏采样将扩散模型知识蒸馏至该网格。随后利用扩散模型优化三维肖像渲染图像，并将优化后图像作为训练数据进一步优化金字塔三网格，有效消除不真实色彩与非自然伪影。实验结果表明，Portrait3D能生成与提示一致且具有高真实感、高质量的标准三维肖像。