Recent advances in Text-to-Image (T2I) diffusion models have transformed image generation, enabling significant progress in stylized generation using only a few style reference images. However, current diffusion-based methods struggle with fine-grained style customization due to challenges in controlling multiple style attributes, such as color and texture. This paper introduces the first tuning-free approach to achieve free-lunch color-texture disentanglement in stylized T2I generation, addressing the need for independently controlled style elements for the Disentangled Stylized Image Generation (DisIG) problem. Our approach leverages the Image-Prompt Additivity property in the CLIP image embedding space to develop techniques for separating and extracting Color-Texture Embeddings (CTE) from individual color and texture reference images. To ensure that the color palette of the generated image aligns closely with the color reference, we apply a whitening and coloring transformation to enhance color consistency. Additionally, to prevent texture loss due to the signal-leak bias inherent in diffusion training, we introduce a noise term that preserves textural fidelity during the Regularized Whitening and Coloring Transformation (RegWCT). Through these methods, our Style Attributes Disentanglement approach (SADis) delivers a more precise and customizable solution for stylized image generation. Experiments on images from the WikiArt and StyleDrop datasets demonstrate that, both qualitatively and quantitatively, SADis surpasses state-of-the-art stylization methods in the DisIG task.
翻译:近年来,文本到图像(T2I)扩散模型的发展变革了图像生成领域,使得仅需少量风格参考图像即可在样式化生成方面取得显著进展。然而,由于难以控制颜色和纹理等多种风格属性,当前基于扩散模型的方法在细粒度风格定制方面面临挑战。本文首次提出了一种无需调优的方法,在样式化T2I生成中实现免费午餐式的颜色-纹理解耦,以应对解耦样式化图像生成(DisIG)问题中对独立可控风格元素的需求。我们的方法利用CLIP图像嵌入空间中的图像-提示可加性特性,开发了从独立的颜色和纹理参考图像中分离并提取颜色-纹理嵌入(CTE)的技术。为确保生成图像的调色板与颜色参考紧密匹配,我们应用了白化与着色变换以增强颜色一致性。此外,为防止因扩散训练固有的信号泄漏偏差导致的纹理损失,我们在正则化白化与着色变换(RegWCT)中引入了一个噪声项,以在生成过程中保持纹理保真度。通过这些方法,我们的风格属性解耦方法(SADis)为样式化图像生成提供了更精确且可定制的解决方案。在WikiArt和StyleDrop数据集图像上进行的实验表明,无论是定性还是定量评估,SADis在DisIG任务中的表现均超越了当前最先进的样式化方法。