Text-guided domain adaption and generation of 3D-aware portraits find many applications in various fields. However, due to the lack of training data and the challenges in handling the high variety of geometry and appearance, the existing methods for these tasks suffer from issues like inflexibility, instability, and low fidelity. In this paper, we propose a novel framework DiffusionGAN3D, which boosts text-guided 3D domain adaption and generation by combining 3D GANs and diffusion priors. Specifically, we integrate the pre-trained 3D generative models (e.g., EG3D) and text-to-image diffusion models. The former provides a strong foundation for stable and high-quality avatar generation from text. And the diffusion models in turn offer powerful priors and guide the 3D generator finetuning with informative direction to achieve flexible and efficient text-guided domain adaption. To enhance the diversity in domain adaption and the generation capability in text-to-avatar, we introduce the relative distance loss and case-specific learnable triplane respectively. Besides, we design a progressive texture refinement module to improve the texture quality for both tasks above. Extensive experiments demonstrate that the proposed framework achieves excellent results in both domain adaption and text-to-avatar tasks, outperforming existing methods in terms of generation quality and efficiency. The project homepage is at https://younglbw.github.io/DiffusionGAN3D-homepage/.
翻译:文本引导的域自适应与三维人像生成在多个领域具有广泛的应用前景。然而,由于训练数据匮乏以及几何与外观多样性带来的挑战,现有方法存在灵活性不足、稳定性差和保真度低等问题。本文提出DiffusionGAN3D新型框架,通过融合3D GAN与扩散先验增强文本引导的三维域自适应与生成能力。具体而言,我们整合预训练的三维生成模型(如EG3D)与文本到图像扩散模型:前者为从文本生成高质量稳定化虚拟人像提供坚实基础,后者则通过提供强大的先验信息并以信息丰富的方向指导三维生成器微调,实现灵活高效的文本引导域自适应。为提升域自适应多样性及文本到虚拟人像生成能力,我们分别引入相对距离损失和实例特定可学习三平面。此外,我们设计渐进式纹理细化模块以改善上述两个任务的纹理质量。大量实验表明,所提框架在域自适应和文本到虚拟人像任务中均取得优异效果,在生成质量和效率方面优于现有方法。项目主页:https://younglbw.github.io/DiffusionGAN3D-homepage/。