The recent GAN inversion methods have been able to successfully invert the real image input to the corresponding editable latent code in StyleGAN. By combining with the language-vision model (CLIP), some text-driven image manipulation methods are proposed. However, these methods require extra costs to perform optimization for a certain image or a new attribute editing mode. To achieve a more efficient editing method, we propose a new Text-driven image Manipulation framework via Space Alignment (TMSA). The Space Alignment module aims to align the same semantic regions in CLIP and StyleGAN spaces. Then, the text input can be directly accessed into the StyleGAN space and be used to find the semantic shift according to the text description. The framework can support arbitrary image editing mode without additional cost. Our work provides the user with an interface to control the attributes of a given image according to text input and get the result in real time. Ex tensive experiments demonstrate our superior performance over prior works.
翻译:近期GAN逆映射方法已成功将真实图像输入逆映射至StyleGAN中对应的可编辑潜码。通过结合语言-视觉模型CLIP,现有研究提出了若干文本驱动图像操作方法。但这些方法需要额外成本对特定图像或新属性编辑模式进行优化。为构建更高效的编辑方法,我们提出基于空间对齐的文本驱动图像操作框架(TMSA)。空间对齐模块旨在对齐CLIP和StyleGAN空间中相同的语义区域。由此,文本输入可直接映射至StyleGAN空间,并用于根据文本描述定位语义偏移量。该框架无需额外成本即可支持任意图像编辑模式。本工作为用户提供根据文本输入控制指定图像属性并实时获取结果的交互界面。大量实验表明,本方法在性能上显著优于现有技术。