Text-guided domain adaption and generation of 3D-aware portraits find many applications in various fields. However, due to the lack of training data and the challenges in handling the high variety of geometry and appearance, the existing methods for these tasks suffer from issues like inflexibility, instability, and low fidelity. In this paper, we propose a novel framework DiffusionGAN3D, which boosts text-guided 3D domain adaption and generation by combining 3D GANs and diffusion priors. Specifically, we integrate the pre-trained 3D generative models (e.g., EG3D) and text-to-image diffusion models. The former provides a strong foundation for stable and high-quality avatar generation from text. And the diffusion models in turn offer powerful priors and guide the 3D generator finetuning with informative direction to achieve flexible and efficient text-guided domain adaption. To enhance the diversity in domain adaption and the generation capability in text-to-avatar, we introduce the relative distance loss and case-specific learnable triplane respectively. Besides, we design a progressive texture refinement module to improve the texture quality for both tasks above. Extensive experiments demonstrate that the proposed framework achieves excellent results in both domain adaption and text-to-avatar tasks, outperforming existing methods in terms of generation quality and efficiency. The project homepage is at https://younglbw.github.io/DiffusionGAN3D-homepage/.
翻译:文本引导的域适应与三维感知肖像生成在多个领域具有广泛应用。然而,由于训练数据匮乏以及几何与外观多样性的处理挑战,现有方法存在灵活性不足、稳定性差、保真度低等问题。本文提出新型框架DiffusionGAN3D,通过结合3D GANs与扩散先验,增强文本引导的三维域适应与生成能力。具体而言,我们整合了预训练的三维生成模型(如EG3D)与文本到图像的扩散模型:前者为文本驱动的稳定高质量虚拟人像生成提供坚实基础,后者则作为强大先验,通过携带信息的方向引导三维生成器微调,实现灵活高效的文本引导域适应。为提升域适应中的多样性及文本到虚拟人像的生成能力,我们分别引入相对距离损失和案例特定可学习三平面。此外,针对上述两种任务设计了渐进式纹理精细化模块以改善纹理质量。大量实验表明,该框架在域适应与文本到虚拟人像任务中均取得优异效果,在生成质量和效率上优于现有方法。项目主页:https://younglbw.github.io/DiffusionGAN3D-homepage/。