DiffStyler: Diffusion-based Localized Image Style Transfer

Image style transfer aims to imbue digital imagery with the distinctive attributes of style targets, such as colors, brushstrokes, shapes, whilst concurrently preserving the semantic integrity of the content. Despite the advancements in arbitrary style transfer methods, a prevalent challenge remains the delicate equilibrium between content semantics and style attributes. Recent developments in large-scale text-to-image diffusion models have heralded unprecedented synthesis capabilities, albeit at the expense of relying on extensive and often imprecise textual descriptions to delineate artistic styles. Addressing these limitations, this paper introduces DiffStyler, a novel approach that facilitates efficient and precise arbitrary image style transfer. DiffStyler lies the utilization of a text-to-image Stable Diffusion model-based LoRA to encapsulate the essence of style targets. This approach, coupled with strategic cross-LoRA feature and attention injection, guides the style transfer process. The foundation of our methodology is rooted in the observation that LoRA maintains the spatial feature consistency of UNet, a discovery that further inspired the development of a mask-wise style transfer technique. This technique employs masks extracted through a pre-trained FastSAM model, utilizing mask prompts to facilitate feature fusion during the denoising process, thereby enabling localized style transfer that preserves the original image's unaffected regions. Moreover, our approach accommodates multiple style targets through the use of corresponding masks. Through extensive experimentation, we demonstrate that DiffStyler surpasses previous methods in achieving a more harmonious balance between content preservation and style integration.

翻译：图像风格迁移旨在将风格目标（如色彩、笔触、形状）的独特属性赋予数字图像，同时保持内容的语义完整性。尽管任意风格迁移方法已取得进展，但如何在内容语义与风格属性间达成精妙平衡仍是一个普遍挑战。大规模文本到图像扩散模型的最新进展带来了前所未有的合成能力，但其代价是依赖广泛且往往不精确的文本描述来刻画艺术风格。为应对这些局限，本文提出DiffStyler——一种实现高效精准任意图像风格迁移的新方法。DiffStyler利用基于文本到图像Stable Diffusion模型的LoRA来封装风格目标的本质特征。该方法结合跨LoRA特征与注意力的策略性注入，以引导风格迁移过程。我们方法的理论基础源于对LoRA保持UNet空间特征一致性的观察，这一发现进一步启发了掩码级风格迁移技术的开发。该技术通过预训练的FastSAM模型提取掩码，利用掩码提示在去噪过程中促进特征融合，从而实现能保留原始图像未影响区域的局部风格迁移。此外，我们的方法通过使用对应掩码支持多风格目标迁移。大量实验表明，DiffStyler在内容保持与风格融合的平衡方面优于现有方法。

相关内容