Image-to-image translation is a fundamental task in computer vision. It transforms images from one domain to images in another domain so that they have particular domain-specific characteristics. Most prior works train a generative model to learn the mapping from a source domain to a target domain. However, learning such mapping between domains is challenging because data from different domains can be highly unbalanced in terms of both quality and quantity. To address this problem, we propose a new approach to extract image features by learning the similarities and differences of samples within the same data distribution via a novel contrastive learning framework, which we call Auto-Contrastive-Encoder (ACE). ACE learns the content code as the similarity between samples with the same content information and different style perturbations. The design of ACE enables us to achieve zero-shot image-to-image translation with no training on image translation tasks for the first time. Moreover, our learning method can learn the style features of images on different domains effectively. Consequently, our model achieves competitive results on multimodal image translation tasks with zero-shot learning as well. Additionally, we demonstrate the potential of our method in transfer learning. With fine-tuning, the quality of translated images improves in unseen domains. Even though we use contrastive learning, all of our training can be performed on a single GPU with the batch size of 8.
翻译:摘要:图像到图像翻译是计算机视觉中的一项基础任务。它将一个域中的图像转换为另一个域中的图像,使其具有特定于目标域的特征。以往的大多数工作通过训练生成模型来学习从源域到目标域的映射。然而,由于不同域的数据在质量和数量上可能高度不平衡,学习此类域间映射极具挑战性。为解决这一问题,我们提出了一种新方法,通过一种新颖的对比学习框架(称为自动对比编码器,ACE)学习同一数据分布内样本的相似性与差异性,从而提取图像特征。ACE将内容编码为具有相同内容信息但不同风格扰动的样本之间的相似性。ACE的设计使我们首次能够在无需针对图像翻译任务进行训练的情况下实现零样本图像到图像翻译。此外,我们的学习方法能够有效学习不同域图像中的风格特征。因此,我们的模型在零样本设置下也能在多模态图像翻译任务上取得有竞争力的结果。同时,我们展示了该方法在迁移学习中的潜力。通过微调,未见域中翻译图像的质量得到提升。尽管采用了对比学习,我们所有的训练均可在单个GPU上以批大小为8完成。