Recently, image-to-image translation methods based on contrastive learning achieved state-of-the-art results in many tasks. However, the negatives are sampled from the input feature spaces in the previous work, which makes the negatives lack diversity. Moreover, in the latent space of the embedings,the previous methods ignore domain consistency between the generated image and the real images of target domain. In this paper, we propose a novel contrastive learning framework for unpaired image-to-image translation, called MCCUT. We utilize the multi-crop views to generate the negatives via the center-crop and the random-crop, which can improve the diversity of negatives and meanwhile increase the quality of negatives. To constrain the embedings in the deep feature space,, we formulate a new domain consistency loss function, which encourages the generated images to be close to the real images in the embedding space of same domain. Furthermore, we present a dual coordinate channel attention network by embedding positional information into SENet, which called DCSE module. We employ the DCSE module in the design of generator, which makes the generator pays more attention to channels with greater weight. In many image-to-image translation tasks, our method achieves state-of-the-art results, and the advantages of our method have been proved through extensive comparison experiments and ablation research.
翻译:近期,基于对比学习的图像到图像翻译方法在许多任务中取得了最先进的成果。然而,先前工作中负样本是从输入特征空间中采样的,这导致负样本缺乏多样性。此外,在嵌入的潜在空间中,先前方法忽略了生成图像与目标域真实图像之间的域一致性。本文提出了一种新的用于无配对图像到图像翻译的对比学习框架,名为MCCUT。我们利用多裁剪视图通过中心裁剪和随机裁剪生成负样本,这既能提高负样本的多样性,同时也能提升负样本的质量。为了约束深度特征空间中的嵌入,我们构建了一个新的域一致性损失函数,该函数鼓励生成图像在相同域的嵌入空间中接近真实图像。此外,我们通过将位置信息嵌入到SENet中,提出了一种双坐标通道注意力网络,称为DCSE模块。我们在生成器设计中采用DCSE模块,使生成器更加关注权重较大的通道。在许多图像到图像翻译任务中,我们的方法取得了最先进的成果,并且通过广泛的对比实验和消融研究证明了我们方法的优势。