Exemplar-based image translation refers to the task of generating images with the desired style, while conditioning on certain input image. Most of the current methods learn the correspondence between two input domains and lack the mining of information within the domains. In this paper, we propose a more general learning approach by considering two domain features as a whole and learning both inter-domain correspondence and intra-domain potential information interactions. Specifically, we propose a Cross-domain Feature Fusion Transformer (CFFT) to learn inter- and intra-domain feature fusion. Based on CFFT, the proposed CFFT-GAN works well on exemplar-based image translation. Moreover, CFFT-GAN is able to decouple and fuse features from multiple domains by cascading CFFT modules. We conduct rich quantitative and qualitative experiments on several image translation tasks, and the results demonstrate the superiority of our approach compared to state-of-the-art methods. Ablation studies show the importance of our proposed CFFT. Application experimental results reflect the potential of our method.
翻译:基于示例的图像翻译是指在给定输入图像的同时,生成具有期望风格的图像的任务。当前大多数方法学习两个输入域之间的对应关系,但缺乏对域内信息的挖掘。本文提出一种更通用的学习方法,将两个域的特征视为整体,同时学习域间对应关系和域内潜在信息交互。具体而言,我们提出一种跨域特征融合Transformer(CFFT),用于学习域内和域间的特征融合。基于CFFT,所提出的CFFT-GAN在基于示例的图像翻译上表现优异。此外,通过级联CFFT模块,CFFT-GAN能够解耦并融合来自多个域的特征。我们在多个图像翻译任务上进行了丰富的定量和定性实验,结果表明我们的方法优于当前最先进的方法。消融研究证明了所提出CFFT的重要性。应用实验结果展现了我们方法的潜力。