Large-scale Text-to-Image (T2I) models have rapidly gained prominence across creative fields, generating visually compelling outputs from textual prompts. However, controlling these models to ensure consistent style remains challenging, with existing methods necessitating fine-tuning and manual intervention to disentangle content and style. In this paper, we introduce StyleAligned, a novel technique designed to establish style alignment among a series of generated images. By employing minimal `attention sharing' during the diffusion process, our method maintains style consistency across images within T2I models. This approach allows for the creation of style-consistent images using a reference style through a straightforward inversion operation. Our method's evaluation across diverse styles and text prompts demonstrates high-quality synthesis and fidelity, underscoring its efficacy in achieving consistent style across various inputs.
翻译:大规模文本到图像(T2I)模型已在创意领域迅速崭露头角,能够根据文本提示生成视觉上引人入胜的输出。然而,如何控制这些模型以确保风格一致性仍具挑战性,现有方法往往需要微调和人工干预才能分离内容与风格。本文提出StyleAligned——一种旨在实现系列生成图像风格对齐的新技术。通过在扩散过程中采用极简的"注意力共享"机制,我们的方法能够在T2I模型内保持图像间的风格一致性。该方案利用参考风格,通过简单的逆操作即可生成风格一致的图像。我们在多种风格和文本提示下进行的评估表明,该方法在实现高质量合成与保真度的同时,有效验证了其在不同输入下保持风格一致性的能力。