Deep generative models have been applied to multiple applications in image-to-image translation. Generative Adversarial Networks and Diffusion Models have presented impressive results, setting new state-of-the-art results on these tasks. Most methods have symmetric setups across the different domains in a dataset. These methods assume that all domains have either multiple modalities or only one modality. However, there are many datasets that have a many-to-one relationship between two domains. In this work, we first introduce a Colorized MNIST dataset and a Color-Recall score that can provide a simple benchmark for evaluating models on many-to-one translation. We then introduce a new asymmetric framework to improve existing deep generative models on many-to-one image-to-image translation. We apply this framework to StarGAN V2 and show that in both unsupervised and semi-supervised settings, the performance of this new model improves on many-to-one image-to-image translation.
翻译:深度生成模型已应用于图像到图像翻译的多个场景。生成对抗网络与扩散模型在这些任务中取得了令人瞩目的成果,不断刷新最先进性能。现有方法大多在数据集的各域之间采用对称结构,假设所有域要么具有多种模态,要么仅有一种模态。然而,许多数据集的两个域之间存在多对一关系。本文首先引入了彩色化MNIST数据集和颜色召回分数,为多对一翻译模型的评估提供简易基准。随后提出一种新的非对称框架,用于改进现有深度生成模型在多对一图像到图像翻译中的表现。我们将该框架应用于StarGAN V2,实验表明,在无监督和半监督设置下,新模型在多对一图像到图像翻译中的性能均得到提升。