Semantic Image Synthesis (SIS) is a subclass of image-to-image translation where a semantic layout is used to generate a photorealistic image. State-of-the-art conditional Generative Adversarial Networks (GANs) need a huge amount of paired data to accomplish this task while generic unpaired image-to-image translation frameworks underperform in comparison, because they color-code semantic layouts and learn correspondences in appearance instead of semantic content. Starting from the assumption that a high quality generated image should be segmented back to its semantic layout, we propose a new Unsupervised paradigm for SIS (USIS) that makes use of a self-supervised segmentation loss and whole image wavelet based discrimination. Furthermore, in order to match the high-frequency distribution of real images, a novel generator architecture in the wavelet domain is proposed. We test our methodology on 3 challenging datasets and demonstrate its ability to bridge the performance gap between paired and unpaired models.
翻译:语义图像合成(Semantic Image Synthesis, SIS)是图像到图像翻译的一个子类,其利用语义布局生成逼真图像。最先进的基于条件的生成对抗网络(GANs)需要大量配对数据才能完成此任务,而通用无配对图像到图像翻译框架在此任务上表现较差,原因在于它们将语义布局进行颜色编码并学习外观对应关系而非语义内容。基于高质量生成图像应能被分割回其语义布局这一假设,我们提出了一种用于SIS的新无监督范式(USIS),该范式利用自监督分割损失和基于全图像小波的判别方法。此外,为匹配真实图像的高频分布,我们提出了一种小波域中的新型生成器架构。我们在三个具有挑战性的数据集上测试了该方法,并展示了其弥合配对模型与无配对模型之间性能差距的能力。