CLIPStyler demonstrated image style transfer with realistic textures using only a style text description (instead of requiring a reference style image). However, the ground semantics of objects in the style transfer output is lost due to style spill-over on salient and background objects (content mismatch) or over-stylization. To solve this, we propose Semantic CLIPStyler (Sem-CS), that performs semantic style transfer. Sem-CS first segments the content image into salient and non-salient objects and then transfers artistic style based on a given style text description. The semantic style transfer is achieved using global foreground loss (for salient objects) and global background loss (for non-salient objects). Our empirical results, including DISTS, NIMA and user study scores, show that our proposed framework yields superior qualitative and quantitative performance. Our code is available at github.com/chandagrover/sem-cs.
翻译:摘要:CLIPStyler仅通过文本描述(无需参考风格图像)即可实现具有真实纹理的图像风格迁移。然而,由于风格对显著对象和背景对象的溢出效应(内容失配)或过度风格化,导致风格迁移输出中对象的底层语义信息丢失。为解决该问题,我们提出语义级CLIPStyler(Sem-CS),实现语义风格迁移。Sem-CS首先将内容图像分割为显著对象与非显著对象,随后根据给定的风格文本描述进行艺术风格迁移。该语义风格迁移通过全局前景损失(针对显著对象)与全局背景损失(针对非显著对象)实现。实验结果表明,包括DISTS、NIMA及用户研究评分在内的评估指标显示,所提框架在定性和定量性能上均表现优异。代码已开源至github.com/chandagrover/sem-cs。