Rethinking the Paradigm of Content Constraints in GAN-based Unpaired Image-to-Image Translation

In an unpaired setting, lacking sufficient content constraints for image-to-image translation (I2I) tasks, GAN-based approaches are usually prone to model collapse. Current solutions can be divided into two categories, reconstruction-based and Siamese network-based. The former requires that the transformed or transforming image can be perfectly converted back to the original image, which is sometimes too strict and limits the generative performance. The latter involves feeding the original and generated images into a feature extractor and then matching their outputs. This is not efficient enough, and a universal feature extractor is not easily available. In this paper, we propose EnCo, a simple but efficient way to maintain the content by constraining the representational similarity in the latent space of patch-level features from the same stage of the \textbf{En}coder and de\textbf{Co}der of the generator. For the similarity function, we use a simple MSE loss instead of contrastive loss, which is currently widely used in I2I tasks. Benefits from the design, EnCo training is extremely efficient, while the features from the encoder produce a more positive effect on the decoding, leading to more satisfying generations. In addition, we rethink the role played by discriminators in sampling patches and propose a discriminative attention-guided (DAG) patch sampling strategy to replace random sampling. DAG is parameter-free and only requires negligible computational overhead, while significantly improving the performance of the model. Extensive experiments on multiple datasets demonstrate the effectiveness and advantages of EnCo, and we achieve multiple state-of-the-art compared to previous methods. Our code is available at https://github.com/XiudingCai/EnCo-pytorch.

翻译：在无配对设置下，图像到图像翻译（I2I）任务因缺乏充分的内容约束，基于GAN的方法通常容易产生模型崩溃。现有解决方案可分为两类：基于重建的方法和基于孪生网络的方法。前者要求变换中或变换后的图像能够完美地转换回原始图像，这有时过于严格并限制了生成性能；后者将原始图像和生成图像输入特征提取器并匹配其输出，但这种方法效率不足且通用特征提取器难以获得。本文提出EnCo——一种简单高效的内容保持方法，通过约束生成器编码器与解码器同一阶段中 patch 级特征在潜在空间的表示相似性来维持内容。在相似度函数方面，我们采用简单的MSE损失替代当前I2I任务广泛使用的对比损失。得益于该设计，EnCo训练过程极为高效，同时编码器特征对解码产生更积极的影响，从而获得更令人满意的生成结果。此外，我们重新思考判别器在采样patch中的作用，提出基于判别性注意力引导（DAG）的patch采样策略替代随机采样。DAG无需参数且仅需极小的计算开销，却能显著提升模型性能。在多个数据集上的大量实验证明了EnCo的有效性与优势，我们相较先前方法实现了多项最优结果。我们的代码开源在https://github.com/XiudingCai/EnCo-pytorch。