Domain adaptation of 3D portraits has gained more and more attention.However, the transfer mechanism of existing methods is mainly based on vision or language, which ignores the potential of vision-language combined guidance. In this paper, we propose an Image-Text coupled 3D portraits domain adaptation framework, namely Image and Text portrait (ITportrait). ITportrait relies on a two-stage alternating training strategy. In the first stage, we employ a 3D Artistic Paired Transfer (APT) method for image-guided style transfer. APT constructs paired photo-realistic portraits to obtain accurate artistic poses, which helps ITportrait to achieve high-quality 3D style transfer. In the second stage, we propose a 3D Image-Text Embedding (ITE) approach in the CLIP space. ITE uses a threshold function to adaptively control the optimization direction of image or text in the CLIP space. Comprehensive quantitative and qualitative results show that our ITportrait achieves state-of-the-art (SOTA) results and benefits downstream tasks. All source codes and pre-trained models will be released to the public.
翻译:三维肖像的域自适应技术日益受到关注。然而,现有方法的迁移机制主要基于视觉或语言模态,忽略了视觉-语言联合引导的潜力。本文提出一种图文耦合的三维肖像域自适应框架,即ITportrait(Image and Text Portrait)。该框架采用两阶段交替训练策略:第一阶段,我们提出三维艺术配对迁移(APT)方法实现图像引导的风格迁移。APT通过构建配对的逼真肖像获取精确的艺术姿态,从而帮助ITportrait实现高质量的三维风格迁移;第二阶段,我们提出基于CLIP空间的三维图文嵌入(ITE)方法。ITE利用阈值函数自适应控制CLIP空间中图像或文本的优化方向。全面的定性与定量结果表明,ITportrait达到了最先进水平(SOTA),并能有效服务于下游任务。所有源代码与预训练模型将公开发布。