Recently, text-guided image editing has achieved significant success. However, existing methods can only apply simple textures like wood or gold when changing the texture of an object. Complex textures such as cloud or fire pose a challenge. This limitation stems from that the target prompt needs to contain both the input image content and <texture>, restricting the texture representation. In this paper, we propose TextureDiffusion, a tuning-free image editing method applied to various texture transfer. Initially, the target prompt is directly set to "<texture>", making the texture disentangled from the input image content to enhance texture representation. Subsequently, query features in self-attention and features in residual blocks are utilized to preserve the structure of the input image. Finally, to maintain the background, we introduce an edit localization technique which blends the self-attention results and the intermediate latents. Comprehensive experiments demonstrate that TextureDiffusion can harmoniously transfer various textures with excellent structure and background preservation. Code is publicly available at https://github.com/THU-CVML/TextureDiffusion
翻译:近年来,文本引导的图像编辑取得了显著成功。然而,现有方法在改变物体纹理时仅能应用如木材或金属等简单纹理,对于云、火等复杂纹理则面临挑战。这一局限源于目标提示需同时包含输入图像内容与<纹理>,从而限制了纹理的表征能力。本文提出TextureDiffusion,一种无需调参的图像编辑方法,适用于多样化纹理迁移。首先,将目标提示直接设置为"<纹理>",使纹理从输入图像内容中解耦,以增强纹理表征。随后,利用自注意力中的查询特征与残差块中的特征来保持输入图像的结构。最后,为保持背景,我们引入一种编辑定位技术,通过融合自注意力结果与中间隐变量实现。综合实验表明,TextureDiffusion能够以出色的结构和背景保持能力,和谐地迁移多种纹理。代码公开于 https://github.com/THU-CVML/TextureDiffusion