Large-scale Text-to-Image (T2I) models have rapidly gained prominence across creative fields, generating visually compelling outputs from textual prompts. However, controlling these models to ensure consistent style remains challenging, with existing methods necessitating fine-tuning and manual intervention to disentangle content and style. In this paper, we introduce StyleAligned, a novel technique designed to establish style alignment among a series of generated images. By employing minimal `attention sharing' during the diffusion process, our method maintains style consistency across images within T2I models. This approach allows for the creation of style-consistent images using a reference style through a straightforward inversion operation. Our method's evaluation across diverse styles and text prompts demonstrates high-quality synthesis and fidelity, underscoring its efficacy in achieving consistent style across various inputs.
翻译:大规模文本到图像(Text-to-Image, T2I)模型迅速在创意领域崭露头角,能够根据文本提示生成视觉上引人注目的输出。然而,控制这些模型以确保风格一致性仍具挑战性,现有方法需进行微调和人工干预以分离内容与风格。本文提出了一种新颖技术StyleAligned,旨在建立一系列生成图像之间的风格对齐。通过在扩散过程中采用最小化的“注意力共享”,我们的方法在T2I模型中保持了图像间的风格一致性。该方法可通过简单的反演操作,利用参考风格生成风格一致的图像。我们在多种风格和文本提示下对其进行了评估,结果表明其合成质量高且保真度强,充分验证了该方法在不同输入中实现风格一致性的有效性。