免费午餐式颜色-纹理解耦的样式化图像生成 (Free-Lunch Color-Texture Disentanglement for Stylized Image Generation)

Recent advances in Text-to-Image (T2I) diffusion models have transformed image generation, enabling significant progress in stylized generation using only a few style reference images. However, current diffusion-based methods struggle with fine-grained style customization due to challenges in controlling multiple style attributes, such as color and texture. This paper introduces the first tuning-free approach to achieve free-lunch color-texture disentanglement in stylized T2I generation, addressing the need for independently controlled style elements for the Disentangled Stylized Image Generation (DisIG) problem. Our approach leverages the Image-Prompt Additivity property in the CLIP image embedding space to develop techniques for separating and extracting Color-Texture Embeddings (CTE) from individual color and texture reference images. To ensure that the color palette of the generated image aligns closely with the color reference, we apply a whitening and coloring transformation to enhance color consistency. Additionally, to prevent texture loss due to the signal-leak bias inherent in diffusion training, we introduce a noise term that preserves textural fidelity during the Regularized Whitening and Coloring Transformation (RegWCT). Through these methods, our Style Attributes Disentanglement approach (SADis) delivers a more precise and customizable solution for stylized image generation. Experiments on images from the WikiArt and StyleDrop datasets demonstrate that, both qualitatively and quantitatively, SADis surpasses state-of-the-art stylization methods in the DisIG task.

翻译：近年来，文本到图像（T2I）扩散模型的发展变革了图像生成领域，使得仅需少量风格参考图像即可在样式化生成方面取得显著进展。然而，由于难以控制颜色和纹理等多种风格属性，当前基于扩散模型的方法在细粒度风格定制方面面临挑战。本文首次提出了一种无需调优的方法，在样式化T2I生成中实现免费午餐式的颜色-纹理解耦，以应对解耦样式化图像生成（DisIG）问题中对独立可控风格元素的需求。我们的方法利用CLIP图像嵌入空间中的图像-提示可加性特性，开发了从独立的颜色和纹理参考图像中分离并提取颜色-纹理嵌入（CTE）的技术。为确保生成图像的调色板与颜色参考紧密匹配，我们应用了白化与着色变换以增强颜色一致性。此外，为防止因扩散训练固有的信号泄漏偏差导致的纹理损失，我们在正则化白化与着色变换（RegWCT）中引入了一个噪声项，以在生成过程中保持纹理保真度。通过这些方法，我们的风格属性解耦方法（SADis）为样式化图像生成提供了更精确且可定制的解决方案。在WikiArt和StyleDrop数据集图像上进行的实验表明，无论是定性还是定量评估，SADis在DisIG任务中的表现均超越了当前最先进的样式化方法。