Generating artistic portraits is a challenging problem in computer vision. Existing portrait stylization models that generate good quality results are based on Image-to-Image Translation and require abundant data from both source and target domains. However, without enough data, these methods would result in overfitting. In this work, we propose CtlGAN, a new few-shot artistic portraits generation model with a novel contrastive transfer learning strategy. We adapt a pretrained StyleGAN in the source domain to a target artistic domain with no more than 10 artistic faces. To reduce overfitting to the few training examples, we introduce a novel Cross-Domain Triplet loss which explicitly encourages the target instances generated from different latent codes to be distinguishable. We propose a new encoder which embeds real faces into Z+ space and proposes a dual-path training strategy to better cope with the adapted decoder and eliminate the artifacts. Extensive qualitative, quantitative comparisons and a user study show our method significantly outperforms state-of-the-arts under 10-shot and 1-shot settings and generates high quality artistic portraits. The code will be made publicly available.
翻译:生成艺术肖像是计算机视觉中的一项挑战性任务。现有的肖像风格化模型虽能生成高质量结果,但大多基于图像到图像翻译,需要源域和目标域的大量数据。然而,缺乏足够数据时,这些方法会导致过拟合。本文提出CtlGAN——一种新型的少样本艺术肖像生成模型,并采用创新的对比迁移学习策略。我们将源域中预训练的StyleGAN适配至目标艺术域,仅需不超过10张艺术人脸。为减少对少量训练样本的过拟合,我们引入一种跨域三元组损失,明确鼓励由不同潜在编码生成的目标实例具有可区分性。我们还提出一种将真实人脸嵌入Z+空间的新编码器,并设计双路径训练策略以更好适配解码器并消除伪影。大量定性、定量比较及用户研究表明,在10-shot和1-shot设置下,我们的方法显著优于现有最先进技术,并生成高质量艺术肖像。代码将公开提供。