We investigate the problem of zero-shot semantic image painting. Instead of painting modifications into an image using only concrete colors or a finite set of semantic concepts, we ask how to create semantic paint based on open full-text descriptions: our goal is to be able to point to a location in a synthesized image and apply an arbitrary new concept such as "rustic" or "opulent" or "happy dog." To do this, our method combines a state-of-the art generative model of realistic images with a state-of-the-art text-image semantic similarity network. We find that, to make large changes, it is important to use non-gradient methods to explore latent space, and it is important to relax the computations of the GAN to target changes to a specific region. We conduct user studies to compare our methods to several baselines.
翻译:我们研究零样本语义图像绘制问题。不同于仅使用具体颜色或有限语义概念对图像进行修改,我们探索如何基于开放式全文描述创建语义画笔:目标是能够在合成图像中指定位置,并应用任意新概念(如" rustic"、"opulent"或"happy dog")。为此,本方法将最先进的逼真图像生成模型与最先进的文本-图像语义相似度网络相结合。研究发现,为实现大幅修改,必须采用非梯度方法探索潜在空间,并需放宽生成对抗网络(GAN)的运算以针对特定区域进行改动。我们通过用户研究将本方法与多种基线方法进行了比较。