Posters play a crucial role in marketing and advertising, contributing significantly to industrial design by enhancing visual communication and brand visibility. With recent advances in controllable text-to-image diffusion models, more concise research is now focusing on rendering text within synthetic images. Despite improvements in text rendering accuracy, the field of end-to-end poster generation remains underexplored. This complex task involves striking a balance between text rendering accuracy and automated layout to produce high-resolution images with variable aspect ratios. To tackle this challenge, we propose an end-to-end text rendering framework employing a triple cross-attention mechanism rooted in align learning, designed to create precise poster text within detailed contextual backgrounds. Additionally, we introduce a high-resolution dataset that exceeds 1024 pixels in image resolution. Our approach leverages the SDXL architecture. Extensive experiments validate the ability of our method to generate poster images featuring intricate and contextually rich backgrounds. Codes will be available at https://github.com/OPPO-Mente-Lab/GlyphDraw2.
翻译:海报在市场营销与广告领域发挥着至关重要的作用,通过增强视觉传达与品牌可见度,为工业设计做出了显著贡献。随着可控文本到图像扩散模型的最新进展,当前研究正更聚焦于在合成图像中渲染文本。尽管文本渲染准确性已有所提升,端到端海报生成领域仍处于探索不足的状态。这项复杂任务需要在文本渲染准确性与自动化布局之间取得平衡,以生成具有可变宽高比的高分辨率图像。为应对这一挑战,我们提出一种端到端文本渲染框架,该框架采用基于对齐学习的三重交叉注意力机制,旨在具有细节化上下文背景中生成精确的海报文本。此外,我们引入了一个图像分辨率超过1024像素的高分辨率数据集。我们的方法基于SDXL架构。大量实验验证了本方法能够生成具有复杂且上下文丰富背景的海报图像。代码将在 https://github.com/OPPO-Mente-Lab/GlyphDraw2 公开。