Recently, unsupervised image-to-image translation methods based on contrastive learning have achieved state-of-the-art results in many tasks. However, in the previous works, the negatives are sampled from the input image itself, which inspires us to design a data augmentation method to improve the quality of the selected negatives. Moreover, the previous methods only preserve the content consistency via patch-wise contrastive learning in the embedding space, which ignores the domain consistency between the generated images and the real images of the target domain. In this paper, we propose a novel unsupervised image-to-image translation framework based on multi-cropping contrastive learning and domain consistency, called MCDUT. Specifically, we obtain the multi-cropping views via the center-cropping and the random-cropping with the aim of further generating the high-quality negative examples. To constrain the embeddings in the deep feature space, we formulate a new domain consistency loss, which encourages the generated images to be close to the real images in the embedding space of the same domain. Furthermore, we present a dual coordinate attention network by embedding positional information into the channel, which called DCA. We employ the DCA network in the design of generator, which makes the generator capture the horizontal and vertical global information of dependency. In many image-to-image translation tasks, our method achieves state-of-the-art results, and the advantages of our method have been proven through extensive comparison experiments and ablation research.
翻译:近期,基于对比学习的无监督图像到图像翻译方法已在多项任务中取得了最先进的成果。然而,先前工作中负样本均从输入图像自身采样,这促使我们设计一种数据增强方法来提升所选负样本的质量。此外,现有方法仅通过嵌入空间中的块级对比学习保留内容一致性,忽略了生成图像与目标域真实图像之间的域一致性。本文提出一种基于多裁剪对比学习与域一致性的新型无监督图像到图像翻译框架MCDUT。具体而言,我们通过中心裁剪和随机裁剪获取多裁剪视图,旨在进一步生成高质量负样本。为约束深层特征空间中的嵌入,我们构建了新型域一致性损失函数,该函数鼓励生成图像在相同域的嵌入空间中逼近真实图像。同时,我们提出将位置信息嵌入通道的双坐标注意力网络DCA,并将其应用于生成器设计,使生成器能够捕获水平和垂直方向的全局依赖信息。在多项图像到图像翻译任务中,本方法均取得最先进成果,大量对比实验与消融研究验证了其优势。