Semantic image synthesis (SIS) aims to generate realistic images that match given semantic masks. Despite recent advances allowing high-quality results and precise spatial control, they require a massive semantic segmentation dataset for training the models. Instead, we propose to employ a pre-trained unconditional generator and rearrange its feature maps according to proxy masks. The proxy masks are prepared from the feature maps of random samples in the generator by simple clustering. The feature rearranger learns to rearrange original feature maps to match the shape of the proxy masks that are either from the original sample itself or from random samples. Then we introduce a semantic mapper that produces the proxy masks from various input conditions including semantic masks. Our method is versatile across various applications such as free-form spatial editing of real images, sketch-to-photo, and even scribble-to-photo. Experiments validate advantages of our method on a range of datasets: human faces, animal faces, and buildings.
翻译:语义图像合成(SIS)旨在生成与给定语义分割图相匹配的逼真图像。尽管近期进展已能实现高质量结果与精确空间控制,但这类方法需要海量语义分割数据集来训练模型。为此,我们提出利用预训练的无条件生成器,通过代理掩码对其特征图进行重排。代理掩码通过简单聚类从生成器中随机样本的特征图生成。特征重排器学习将原始特征图重排为与代理掩码形状匹配的形式—这些代理掩码既可来源于原始样本本身,也可来自随机样本。随后我们引入语义映射器,可从包括语义分割图在内的多种输入条件生成代理掩码。我们的方法适用于多种应用场景,包括真实图像的自由形式空间编辑、草图转照片乃至涂鸦转照片。实验在人类面部、动物面部及建筑物等多个数据集上验证了该方法优势。